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INTRODUCTION 


HIS volume is a report of the results of work done by the 

Institute of Educational Research of Teachers College, 
Columbia University, to provide tests for use in the vocational 
guidance of children in their early teens. The particular prob- 
lem was to select or devise tests (1) that would be of value in 
predicting fitness for various careers, (2) that could be given a) 
to children in fairly large groups, 0) by any intelligent teacher or 
social worker who would give a reasonable amount of time to 
training for the work, and c) within a time limit of three hours; 
and (3) that could be prepared and scored cheaply. 

It seemed best to divide the three hours of test time (two hours 
of actual working time for the children) somewhat equally among 
three abilities: (1) the ability to deal with ideas and symbols 
for ideas; (2) the ability to deal with things and mechanisms; and 
(3) the ability to deal with clerical items and procedures. (We 
made no attempt to test the ability to deal with people.) These 
three abilities correspond roughly to three of the trunk lines of 
vocational activities which a fifteen-year-old may enter. He 
may stay in school, or he may learn a trade, or he may do office 
work. A fourth main line is selling. Tests to predict fitness for 
selling are being made the subject of extended studies by the 
Carnegie Institute of Technology. 

The tests finally chosen are as follows: 

( The I.E.R. Arith.-Re. Test or any of 
“Ability with ideas”’ the standard tests of general intelli- 
gence. 
For boys, the Stenquist Assembly Test. 
For girls, the ILE.R. Assembly Test. 
“Ability with clerical {@) High level, the I-E.R. General 
items and proce- Clerical Test, C-1. 
dures”’ | b) Lower level, the I.E.R. Test C-2. 


“Ability with things” ‘ 


These can be given to a group of one hundred children within the 
time specified, at a money cost of about thirty dollars for mate- 
rials, a time cost of ten hours of one trained person and twenty 
hours of each of four slightly trained helpers. 
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We cannot as yet determine how great value the child’s record 
in these tests will have in predicting his fitness for vocations, 
general or special, or in guiding him in the choices which he has 
to make. The only satisfactory way to determine their value is 
by actual trial and prolonged study of the correspondence be- 
tween the predictions made by the test scores and the actual 
future life histories of the children tested. A thousand boys 
and a thousand girls of the graduating elementary school grade 
are now being tested, and will be followed in school and industry 
as fully and as far as possible. 

We have, however, done what was feasible to estimate the pre- 
dictive value of these tests in advance, first by investigations of 
the extent to which the separate tests do measure different fea- 
tures of human nature, and second by checking them against 
criteria of vocational success. The different studies made for 
this purpose are described in detail in the body of the report. 
The gist of our findings is as follows: 

Our test for ‘‘ability with ideas”’ is satisfactory. It does cor- 
respond with success in school work and book-learning in gen- 
eral. A boy or girl who scores well in it is found to have done well 
and to be doing well in book-learning. By taking more time and 
repeating the test, the correspondence and prophecy can be made 
still more precise. It has the advantage over previous tests of 
ability with ideas of being far less subject to special practice or 
unfair coaching. It can be used helpfully along with any of these. 

The boys’ test for ‘ability with things”’ seems as satisfactory 
as any that is likely to be devised with present knowledge. It 
measures a distinct, and probably an important, feature of human 
ability. The score obtained in it corresponds well with success 
in shop work in schools. The same is true of the girls’ test for 
‘ability with things.’’ We have been unable to check either test 
against actual industrial success ‘‘on the job’”’ or to ascertain 
how free they are from disturbing influences from special practice 
or coaching. Probably it will be very hard to make any tests of 
mechanical ability that are not susceptible to these influences. 
We are confident, however, that any competent psychologist or 
employment manager or vocational counsellor would consider the 
scores attained in these tests a valuable part of a personnel record 
for boys and girls in the early teens. 

The two tests of ‘ability with clerical items’? have been the 
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subject of extended study, but the results are not clear. The 
higher-level test is indicative (when its elements are properly 
weighted) of ability to succeed in training for stenography, type- 
writing, and bookkeeping in business schools. It is somewhat, 
but less, indicative of success among actual office workers. Its 
value in both of these respects may, however, be in considerable 
measure due to its relation to general intelligence; and in children 
from thirteen to fifteen it seems to be very largely a test of 
general intelligence. 

The lower-level clerical test measures powers that are more 
distinct from general intelligence. It indicates success among 
actual office workers about as closely as the higher level tests. 

It is impossible to decide from the previous work by Thurstone, 
Ruggles, Thorndike and others, how far the mental abilities which 
function in clerical work are simply lesser degrees of those re- 
quired for success in schools, professions and thought-work in 
general, and how far they are different specialized abilities. 
The present investigation leaves the question still open. An 
answer should come from the future careers of the two thousand 
children being tested and followed. 

At all events, the two clerical tests are worth the slight time 
and expense required to give them, because the higher-level test 
is a useful check on the test for ability with ideas, and because 
low scores in both mean deficiency in ability for clerical work 
by any reasonable hypothesis. One of the greatest services of 
vocational guidance to children from thirteen to sixteen is to 
direct away from commercial high school, business colleges, and 
office work those who have little or no chance of usefulness and 
happiness there. It will be very much safer to do this by the aid 
of the clerical tests than on the basis of an intelligence test alone. 

In view of the practical conditions under which vocational guid- 
ance is now, and probably for some years will be, given, we pre- 
pared tests for a three-hour period. Our studies show beyond 
question that a much longer time is desirable. We can measure 
a fourteen-year-old’s physical stature in a few seconds and with 
an error of only one per cent of the difference between the short- 
est and the tallest of his age, but to measure his stature in general 
intelligence with an error of one per cent of the difference between 
the dullest and the brightest fourteen-year-old would require 
at least three hours and probably more. 
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Still more time seems to be needed for a precise measure of 
ability with things and mechanisms. A thirty-minute test is 
enormously better than nothing, but it is also enormously worse 
than an accurate measure. A wide sampling of tasks is needed 
because a boy or girl may have ability in one sort of mechanical 
task and not in another. 

A sampling of days is needed because a child has his ups and 
downs. Time is also required to make sure that he understands 
what he is to try to do, and to free him from fear, excitement, 
and confusion. It is sound practice to give in every case at least 
two tests for any ability, preferably on two different days; to 
this end we have chosen or devised tests which can be extended 
by alternative forms of equal difficulty and like significance. 

Unfortunately, limitations of time and funds and the reluc- 
tance of those in charge of schools, factories, and offices to set 
aside the required testing time, have prevented us from following 
this practice throughout in our own studies. We have given few 
retests, though we have given them wherever it was feasible. 

In connection with preparing the tests and finding out what 
may be expected of them in the measurement of fitness for or 
the prediction of success in various vocations, we have had to 
consider many facts and problems, including technical matters 
of arrangement, administration and scoring, mathematical 
methods in multiple correlation, and certain general principles of 
mental measurement and of vocational guidance. Our findings 
concerning these will be of interest to all students who seek 
mastery in this field. 
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CHAPTER I 
THE PROBLEM 


The scope of inquiry of the research has been limited to a deter- 
mination of the value of certain instruments, particularly tests, 
which have been either used or suggested as desirable instruments 
upon which to base vocational advice. Additional tests have 
been devised to meet needs discovered during the investigation. 
In the investigation covered by this report no extensive attempt 
has been made to give vocational advice nor to determine the 
efficiency of any advice given, nor even primarily to devise a 
practical working basis for giving such advice. 

A study of the literature shows that investigators in the past 
have devoted their attention almost exclusively to two types of 
vocational test investigation. 

The first of these, probably most adequately typified by the 
work of the Army Trade Test Division, has dealt with the effi- 
ciency of tests in measuring acquired vocational proficiency as a 
basis of probable fitness for an immediate position in industry. 
In content such tests have not all been primarily trade tests. A 
great many other varieties of tests have also been tried out by 
other investigators. One of the most notable pieces of such work 
is the extensive researches reported by Dr. Link in his book on 
Employment Psychology. He makes use of psychological tests 
specially selected by means of their correlations with demonstrated 
degrees of efficiency when administered to workers of known 
ability, to determine what tests should be used in the employ- 
ment office to select applicants who presumably would develop 
worthwhile industrial capacity. 

The second trend is but an amplification in degree of the work 
of Link, wherein tests selected by varying means, frequently by 
random selection, have been tried out upon groups of workers; 
the efficiency of the tests in predicting the known different de- 
grees of skill of the workers being then reported. Such tests vary 
in content from those which, by analogy, resemble the occupa- 
tional activity, to tests which are designed to measure general 
intelligence, reading, or other strictly educational content. 
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A common feature of practically all such tests has been that 
they were administered to the test subjects at a period after they 
had already gained some vocational proficiency. The validity 
of the conclusion that such tests are of high value, or otherwise, 
depends upon the validity of the assumption that similar results 
would have been obtained had the tests been given as true prog- 
nosis, or as hiring, tests at the time of entrance of the subject 
to the occupation. We should examine, then, the following more 
or less plausible hypotheses of the relationship of test scores to 
vocational proficiency: (1) That such proficiency as is exhibited 
on tests at a given test date after entrance to the occupation may 
be almost solely tests of acquired training, such a condition as 
would insure high correlations of vocational proficiency with 
tests which measure for the most part progress rather than 
native'capacity; (2) that inasmuch as workers of varying lengths 
of experience are frequently chosen for subjects, the correlations 
of tests with efficiency might be high because of the fact that the 
tests measure for the most part varying amounts of experience or 
practice and that such correlations would not hold were all 
subjects of the same experience level; (3) that the reverse might 
be true, that with equal amounts of experience the correlations 
of tests and efficiency might be higher than with unequal amounts; 
(4)that the relative scores of subjects at the time of the tests are 
substantially the same as they would have been at the time of 
entrance to the occupation and that consequently the correlations 
would be just as valid at entrance as at the time of the test; in 
other words, that there is no practice effect; (5) that the tests 
measure native capacity only and consequently are quite valid 
for predicting potential progress; (6) that inasmuch as elimination 
of a certain percentage of the ‘‘unfit’’ has presumably been made 
before the time of administering the tests, the correlations of the 
“survivors” are smaller than they would have been if computed 
upon the “applicants,” and consequently the published validity 
coefficients are very conservative judgments of the real value of 
the tests. 

In the absence of conclusive evidence bearing upon the above 
possibilities, it is highly desirable to test a large group of subjects 
previous to their entrance to industry, under actual conditions 
of guidance, and to follow up the careers of such pupils in order 
to determine whether the tests have functioned to advantage. 
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This work will be carried on during the coming year with the 
assistance of an additional grant made by the Commonwealth 
Fund. 

Vocational and educational guidance, to be most effective, 
should be given at some period in the elementary school. There 
still remains the question of whether or not tests given during 
the elementary school period will predict capabilities after the 
pupils have grown up and entered industry, 7.e., at what age does 
capacity develop to the point where its future course can be suc- 
cessfully predicted? It might possibly be the case that tests 
would predict vocational capacities if administered after the sub- 
jects had passed the age at which intelligence is popularly sup- 
posed to cease developing, but that the test would be useless if 
given before this age had been reached. This view would require 
that specialized aptitudes be developed after general intelligence 
reaches its maximum. Some light would also be expected to be 
thrown upon the answer to this question by an extended follow- 
up investigation which would continue over a period of years. 

It has been the specific purpose of this inquiry to investigate 
the value, as a means of predicting school and vocational ability 
of people already occupied in those vocations, of the following five 
scales: The Stenquist Mechanical Assembly Test; the I.E.R. 
Girls’ Assembly Test; the I.E.R. General Clerical, or High Level 
Business Test; the I.E.R. Clerical Test C—2, or Low Level Busi- 
ness Test; and an intelligence scale composed of arithmetic and 
reading, to be described later. The use of these tests in a voca- 
tional guidance program should obey the testing principle of 
securing the maximum predictive value with minimum exertion. 
This will be accomplished most easily whenever adequate criteria 
of future progress ability in a vocation have been secured, by 
weighting each of the tests above named (or others which may 
be used in a vocational guidance program) according to its 
independent contribution to the particular vocational criterion 
of the vocation for which one is attempting to determine a given 
individual’s fitness. If it should be true that high fitness for a 
clerical vocation, other things being equal, means low fitness 
for a mechanical one, a negative weight of the mechanical test 
score in a clerical scale would be the statistical outcome of the 
application of the method customarily used in such cases. This 
would yield a marked differentiation between the different 
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occupational groups. In any case, each test could be weighted 
according to its contribution in predicting each of a great many 
different criteria of occupational success. One advantage to be 
gained is that one could avail himself almost without additional 
effort of the greater predictive value of additional tests. Two 
tests are always better than one, if properly weighted with respect 
to each other. Save by the merest chance, no test correlates zero 
with an occupational criterion, the usual expectancy being that 
the correlations will be positive with almost any type of test 
administered. 

A clerical test will thus predict to some extent ability to pro- 
gress in a rather mechanical vocation and vice versa; so likewise 
will a reading test predict to some extent one’s ability in school 
work, in a mechanical vocation, or in the most varied or the 
most routine clerical work. At the present time our criteria are 
ordinarily quite too inaccurate to justify more than tentative 
conclusions. It has been our purpose to attempt to determine 
some of the basic relationships involved in the above named tests. 
Refined statistical techniques have been used where desirable, 
and crude comparisons where economical; in some cases we have 
not hesitated at hazarding guesses where sufficient evidence is 
not available. 

If we should come to a valid conclusion regarding the extent 
to which high mechanical capacity means likewise high clerical 
capacity or high academic school capacity and vice versa, the re- 
sults would be of tremendous value for vocational guidance. If 
we could settle the question of to what extent one must have a 
high degree of general intelligence or of education in order to 
enter the different levels of clerical work, that likewise would 
be an important conclusion. The account of the various re- 
searches undertaken in an attempt to settle some of these ques- 
tions will be found in the following pages. 

It is assumed that vocational guidance is a function of the 
public elementary school for several reasons: (1) A pupil does 
not ordinarily enter a vocation until after his compulsory or vol- 
untary education is completed; (2) guidance should be given in 
advance of the immediate situation, such as school elimination, 
which plunges the boy or girl into industry; (3) the facts of school 
elimination show that if guidance is not given in the elementary 
school, a majority of boys and girls will choose a vocation with- 
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out such advice, since but a small portion enter high school; 
(4) early differentiation of school work in the case of some boys 
and girls seems desirable; (5) the elementary school has control 
at some time over all boys and girls, has their trust and confi- 
dence, can and should supply training in abilities in which it 
may be noted the individual is not as well developed as he should 
be, and is financially disinterested; (6) the elementary school 
collects more valuable data on the abilities of growing children 
than any other agency. 


CHAPTER II 
TESTS OF ABILITY WITH IDEAS AND SYMBOLS 


Persons giving vocational guidance to children from 13 to 16 
years of age will as a rule have access to the children’s school 
records. Such facts as the grade reached at a certain age or 
within a certain number of years of schooling, the standing 
attained in comparison with others in the teacher’s estimation, 
and the scores made in standard tests of educational achievement, 
—these and other matters of school record are predictive of later 
success in school and to some extent in vocations. This may be 
illustrated from the work of Kelley,! Miles,? and Ross.’ 

Kelley ! had as the object of his investigation, ‘‘the utilization 
of measures obtainable under ordinary classroom conditions, with 
whatever errors may be inevitable, for whatever they actually 
demonstrate themselves to be worth as evidence of the capacity 
it is desired to measure.’’ He concludes (p. 84): “It will be 
found that having once initiated a guidance bureau, the demands 
upon it will be positive and innumerable—many of them extrava- 
gant. In the attempt to meet these demands, and to meet them 
on the spot, and without a moment’s delay, one of the richest 
sources of information is likely to be only very partially utilized. 
Reference is made to that product accumulated by every pupil— 
school grades.” 

In the case of 59 pupils whose marks were available from the 
fourth grade through the first year of high school, he finds that 
the average grades in the first year high school may be predicted 
from a regression weighting of the average marks in grades 4, 5, 6, 
7, 8, to the extent of r=.79+.03. In specific subjects, first-year 
high school English marks may be predicted from English marks 
for grades 4, 5, 6, 7, 8, to the extent of r=.71+.04, while mathe- 
matics similarly measured yields r=.58+.06. Kelley combines 


1 Kelley,T.L. Educational Guidance. 116 pp. Teachers College Contributions 
to Education, No. 71, 1914. 


* Miles, W.R. Comparison of Elementary and High School Grades. Univ. of lowa 
Studies in Education, Vol. I, No. 1. 

* Ross, C.C. “The Diagnostic Value of Individual Record Cards.’ To appear 
shortly in the Journal of Administration and Supervision. 
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elementary school marks, teachers’ estimates, and special tests of 
33 pupils and predicts first year high school standing by the 
regression equation with an efficiency, r=.89+.02. He says 
(p. 70): “This very high correlation is of interest in showing the 
stability of individual character.” 

He further finds (p. 13): ‘‘ Indeed it seems that an estimate of a 
pupil’s ability to carry high school work when the pupil is in the 
fourth grade may be nearly as accurate as a judgment given when 
the pupil is in the seventh grade, for the correlation in the former 
case is .62 and in the latter only .10 higher.” 

Miles ! finds that the correlation between average elementary 
school grade and the average high school grade is .71. This is 
quite in harmony with the results of the Kelley study and it is 
probable that the Miles data, treated by the regression equation 
method, would yield correlations between .80 and .9o. 

C. C. Ross, in a preliminary investigation of the traits pos- 
sessed by 46 high school graduates, has found that in so extremely 
selected a group as the graduating class of the senior high school, 
average marks in high school work of such pupils can be predicted 
to the extent of a correlation of .74+.05 by means of a proper 
weighting of the school marks and absences, which had accumu- 
lated in the principal’s office, up to the time of graduation from 
the eighth grade. The factors of most importance in this case are 
the school marks made in the school subjects throughout the 
grade school career. 

The eighth grade is very much more limited in range of ability 
than is an age group; this range of ability is presumably greatly 
decreased by the fact that only a portion of the eighth grade 
graduates enter high school; the range of ability of the entering 
high school students is further delimited by reason of the large 
elimination which takes place between the entrance to high school 
and graduation. If one could adequately allow for the change in 
range of ability involved in all of the above, the correlation 
coefficient of .74 would undoubtedly be considerably increased. 

An intelligence test should be used to supplement, not to 
replace, an educational history. The two together will make a 
better prediction than either alone. 

Any efficient test of ability with ideas will serve as such an 


1 Miles, op. cit. 
2 Ross, op. cit. 
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intelligence test. The value of its contribution in the prediction 
of success in school work has been made clear by many workers. 
As an illustration, we may take the work of Mann! and Thorn- 
dike. 

They studied the qualifications of engineering students in the 
case of 34 Columbia University freshmen in engineering.t The 
correlation between a weighted composite of seven tests and a 
composite criterion of intelligence was .87+.03; and for fifteen 
weighted tests, .97+.01. This unusually high result was due to 
an extremely large range of ability, such as is seldom found. The 
seven tests consumed five hours’ time in administration. Five 
of the seven selected tests were mathematics tests, and the 
remaining two were sentence completion tests. The same fifteen 
tests when given to 41 engineering freshmen at the University of 
Cincinnati correlated with ‘‘academic achievement”’ to the extent 
of .64+.06. The first year college ratings correlate .62+.07 
with the second year college ratings; in other words, the tests 
predict the first year marks equally as well as the first year marks 
predict the second year marks. When given at the Massachusetts 
Institute of Technology to 40 freshmen in engineering, the cor- 
relation with academic rating was .49+.08, while the first year 
marks correlated with the second year marks to the extent of 
.64+.06. These are all engineering groups. 

Entrance to the professions in general requires now a high 
school graduation as a prerequisite, and we have made an elab- 
orate study, reported elsewhere,? of the significance of a person’s 
score in a standard test of ability with ideas as an indication of 
how likely he is to progress to high school graduation. The 
following summary of Miss Cobb’s findings will indicate the 
significance of the Army alpha scores in this connection. 


The high school population in this country is limited to approxi- 
mately the upper half of the whole range of American intelligence, 
as measured by Alpha. This means an Alpha score of 65 up. 
Children who, at 14 years of age, cannot score more than 65 are 
not likely even to enter high school. For success in a non- 
academic course in which the subjects are for the most part 


‘Mann, C. R. A Study of Engineering Education. Carnegie Foundation. 
Bulletin No. 11, 1918. 

*Cobb, M. V. “The Limits Set to Educational Achievement by Limited Intelli- 
ae) Journal of Educational Psychology, Vol. 13, Nos. 8 and 9 (Nov. and Dec., 
1922). 
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definitely less difficult than algebra, the ability to score at least 
85 to 100 is usually necessary. Our Michigan data give evidence 
that, of the freshmen who score 77 or less on Army Alpha, about 
87 per cent drop out before the senior year. Madsen’s data 
confirm this; about 84 per cent of the freshmen who score 77 or 
less are eliminated. This means that only about one in seven 
remains to graduate, unless the alpha score is better than 77. 
Similarly, the Army data show that when the recruits were in 
school, 10 years or more ago, 78 per cent of those scoring below 85 
in later life had not remained in high school to graduate. In the 
group of commissioned officers in which other character traits 
were on the whole very high, the elimination among those scoring 
below 85 was only 24 per cent. This, however, is more than 
twice the proportion dropped from this officer group, as a whole, 
i.e., those scoring above, as well as below, 85; only 10 per cent of 
the entire group failed to graduate. 

For a strictly academic course, including algebra, success, i.e., 
profit to the child, is doubtful for a child who at 14 cannot score 
100 to 110 or better on the Army test. This means a mental age 
of 15-6 to 16-2 and (at 14 years) an I.Q. of 110 to 115. Proctor 
mentions 67 as a general minimum, but this seems to us rather 
low; it means an intelligence quotient of 95. Probably nine 
times out of ten it is unwise to guide the average, or less intelligent 
than average, child into the present academic high school. 
Unless his I.Q. is over 100, or his mental age over 14, he should be 
encouraged to try some other type of training. 


The value of a score in ability to deal with ideas as a means of 
predicting fitness for the actual work of a vocation will, of course, 
depend upon what the vocation is. Doubtless the correlation is 
in general positive, and in some cases high. Knight! found 
among a group of superior high school teachers, a ‘‘corrected for 
attenuation”’ correlation of .57 between score in such a sixty- 
minute test and reputed success as teachers, and we may there- 
fore estimate conservatively that if ten thousand fourteen-year- 
olds were taken at random, tested with such a test, trained to be 
high school teachers and tried at the job, the correlation between 
the test score and their future success as high school teachers 
would be well over .9o. 

On the other hand, there are industrial careers where ability 
to deal with ideas is probably a very minor factor in success. 

The relationship between general intelligence test scores and 


1Knight, F. B. Qualities Related to Success in Teaching. 67 pp. Teachers 
College Contributions to Education, No. 120, New York, 1922. 
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specific occupational aptitudes may be low. This is illustrated 
by two studies reported in the Memoirs of the National Academy of 
Science, Vol. XV.1_ The Pearson association coefficient, C, be- 
tween letter ratings secured on army trade tests and letter ratings 
secured on Army Alpha, is shown for four groups below. Of 
these four groups, the truck drivers were tested on a performance 
test, while the other three groups were tested on oral trade tests. 


NUMBER 
OccuPATION OF CASES PEARSON C 
AUtO: repairman! pacteuis nee eerste 666.0 ce Benes .265 
Machinistercsene. tester Cie wee eee ASTRe shake yore .254 
ELorsexhostleras,..ascre eee rare 200% ame aes 277 
Heavyateuckdrivete yee eee ee ete 644 130 


The above results are complicated by varying amounts of experi- 
ence. The conclusions derived from the detailed tables from 
which the above was constructed are: 

It would seem that the function of intelligence plays a varying 
part in its relation to degrees of skill as classified by trade tests. 
For.example in the case of general auto repairmen, we seem to 
find a difference (in Alpha scores) only between the apprenticeship 
level and the higher levels of skill. In the case of machinists we 
find the difference only between the expert level and the two lower 
levels. In the case of truck drivers we find no significant differ- 
ences between levels of skill, although intelligence is demonstrably 
a factor in qualifying in the apprenticeship level of that trade. 


Of even more interest is a result of a study of a group of grapho- 
type operators, for whom, evidently, a very objective criterion 
of efficiency was available. ‘In a group of 106 graphotype 
operators of the Treasury Department, the median Alpha score 
is 75 with extremes of 11 and 174. The median of the average 
daily output of plates by this group is 245 with a median error of 
2.9 per cent. The highest individual average is 391 plates per 
day, the lowest, 113. The correlation between output and 
accuracy is .113 +.06, and between Alpha and accuracy, .019 + .06, 
and between Alpha and output, —.087+.06. The returns are of 
special interest in that they exhibit such low correlations between 
intelligence and accuracy and speed in mechanical work.”’ 

Otis has similarly found “zero” correlation between “ produc- 


* PP. 835-37- 
2 Otis, A. S. “The Selection of Mill Workers by Mental Tests.’’ Journal of 
Applied Psychology, Vol. 4, No. 4, pp. 339-41 (1920). 
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tive ability ascertained by careful investigation”’ and intelligence 
in a group of 400 employees of a silk mill, and comes to the conclu- 
sion that ‘intelligence is not only not required in a modern silk 
mill for most operations, but may even be a detriment to steady, 
efficient, routine work. What qualities are required, remain to be 
sought. Whether they are measurable is doubtful. They may 
be stolidity, inertia of attention, regularity of habits, etc.’ A 
number of factors serve to attenuate his results, yet the essential 
conclusion of his research clearly points out the small rdle which 
general intelligence may have in factory work. 

On the other hand, Link! has found very significant relation- 
ships between specialized tests and operations which are no more 
complex than those involved in silk mill operations. 

E. M. Martin, in an unreported investigation of the talents of 
30 policemen, founda correlation of .80 + .04 between a) a criterion 
composed of a composite of one ranking and one rating of patrol- 
men by each of four commanding officers, and 0b) a statistically 
selected weighted composite of intelligence and educational tests 
and physical and social traits. 


Tue J.E.R. Ariry.-RE. TEstT 


Any one of the better tests of so-called general intelligence will 
serve the purpose of vocational guidance of children from 13 to 16. 
We suggest the use of a combination of the Thorndike-McCall 
Reading Test and the test in Arithmetical Problem-Solving. 
This combination of tests is referred to in this study as the I.E.R. 
Arith.-Re. Test. This makes an intelligence test which is easy to 
give and easy to score, which all the children understand, which is 
remarkably free from harmful effects of special practice, and 
which is, or may be made, uncoachable, since there already 
exist ten alternative forms of it, and hundreds more can be 
made. 

The value of many of the current intelligence scales is impaired 
by reason of their high susceptibility to improvement through 
practice. Ratings from such a scale will generally become higher 
and higher as successive forms of the test are taken by the test 
subject. This variation in ratings due to practice improvement 
does not hold in the case of arithmetic and reading, ratings from 


1Link, H. Employment Psychology. 440 pp. Macmillan Company, I9I9. 
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which are more nearly constant, due to the fact that at any one 
particular time both reading and arithmetic have been almost 
maximally practiced. If a weighted combination of a good arith- 
metic and a good reading test would correlate as highly on the 
average with the present intelligence scales as the average corre- 
lations of such intelligence scales with all others, then we would be 
justified in assuming that for practical purposes such a test is as 
valid an intelligence measure for the general school population as 
the current intelligence tests. This makes the assumption, which 
has abundant substantiation in fact, that such a combination will 
correlate as highly with ability to progress in school as will the 
intelligence tests. Whether such a combination correlates with 
the current intelligence scales is an aspect of reliability; whether 
it correlates with an adequate measure of school progress ability 
is the corresponding aspect of validity. Naturally one-half of 
total school work, if adequately measured, correlates more highly 
with the other half of school work itself than will any possible 
practical combination of opposites tests, completion tests, and the 
like. A measure of at least two school abilities is desirable for 
giving educational guidance. A knowledge of one’s reading and 
arithmetic abilities is valuable for the specialized abilities them- 
selves; if we can secure an intelligence rating from them gratis, 
that is just so much additional information. Such tests will be 
unfair to only those who have language difficulty. The work of 
Kelley! indicates that there is as much reason to believe that one’s 
educational status remains approximately constant throughout 
his school career as that his intellectual brightness remains con- 
stant; this would be apparent if we could rid our minds of the 
arbitrary nature of the units at present employed in measuring 
brightness. 

The Weighting of Arithmetic and Reading for Use as an Intelli- 
gence Score.—I\n the absence of any criterion of general ability to 
progress in school work it was decided to weight arithmetic and 
reading equally. A number of investigations have seemed to 
indicate that in all probability reading should be weighted the 
higher of the two, deficiencies in reading being of greater signifi- 
cance for determining the rate of progress through school. How- 
ever, an undue weighting of reading will make the test less 


1 Kelley, T. L. Educational! Guidance. 
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reliable in the individual cases of pupils who have language 
difficulty. This fact points to a possible desirability of not giving 
reading as much weight as might otherwise be accorded to it. 
It was, however, decided to weight arithmetic and reading equally. 
Since there are available a number of alternative forms of the 
Thorndike-McCall Reading Scale, each of which varies slightly 
from the others in difficulty, but which variation is theoretically 
adequately taken care of by using the T-score rather than the 
gross score made by each individual child, the T-scores were 
used as the basis for determining the variability of the Thorndike- 
McCall Reading Test. Such T-scores were not available for the 
Thorndike Arithmetical Problem-Solving Test so raw scores were 
used in determining its variability. For the 467 pupils in Public 
School B the standard deviation of reading T-scores was 10.52; 
the corresponding standard deviation for the Thorndike arith- 
metic raw scores was 3.40. If, then, we weight the raw scores of 
arithmetic 3 and the reading T-scores 1, arithmetic will be given 
a partial correlation importance of 97/100 of reading, giving 
reading a slightly greater weight. This enables the weighting to 
be very readily done since the multiplication of the raw scores of 
the arithmetic can be done mentally and no computation is 
required on the reading T-scores. The entire school involving 
all pupils in 6A (first semester of the sixth grade) and above, and, 
in addition, all boys 13 years of age and over in the entire school, 
was used to determine the variability for the ages to which one is 
likely to be giving vocational advice, namely 13, 14, and 15 years. 
The distribution of reading T-scores and arithmetic raw scores 
shows that the numbers of pupils who fail to grasp the directions 
of the tests is insignificant and also that there is a good spread of 
scores in these ages. The results of the intercorrelations of this 
composite variable, hereinafter called the ‘‘ Arith.-Re.” Test, with 
other well known measures of intelligence show that it correlates 
as well with such measures as do the general run of such measures 
correlate with each other. The intercorrelations of reading and 
arithmetic with half-year gains in school work and with C-1, 
General Clerical Test show that reading correlates slightly better 
than arithmetic with half-year gains, and that reading likewise 
correlates better with C-1. This holds true for both boys and 
girls of separate ages 12-15 inclusive, with but one or two ex- 
ceptions. Absence from school does not affect this intelligence 
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TABLE I 


THE INTERCORRELATIONS BY AGE GrouPS OF INTELLIGENCE MEASURES 
ADMINISTERED TO GIRLS, IN PUBLIC SCHOOLS G AND J, ON WHOM THE 
Recorps WERE COMPLETE* 
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t Weighted average r found by weighting all ages 1, save age 15, which is weighted }. 
* The Probable Error of Correlation Coefficients in this table may be determined from the 
following table: 
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measure quite as much as it affects a pupil’s half-year gains as 
shown by the correlation coefficients in Table V, page 22. 

The Comparison of the Arith.-Re. Test with Standard Intelligence 
Tests—At Public School G a number of intelligence and educa- 
tional tests had been given in the fall of 1921 by Dr. J. L. Sten- 
quist, of the Bureau of Reference and Research of the New York 
City Schools. These scores were available for comparison with 
the corresponding intelligence scores determined in the present 
investigation. The records of 164 people, divided among five 
ages, were found complete for the following test variables: 


| 


. Half-year gains, an objective measure of school progress. 

2. The I.E.R. Arith.-Re. Test, an intelligence measure. 

3. The I.E.R. General Clerical Test, C-1, weighted with the 
series of weights used in the Company I investigation, 3, 3, 
3 LO, CLG: 

. The I.E.R. Clerical Test, C-2, weighted. 

. Haggerty Delta 2 Intelligence Test. 

. The National Intelligence Test. 


OW 


The intercorrelations of the six variables by the five age groups 
are shown in Table I, together with the weighted average of the 
five in each compartment. This weighted average is found by 
weighting the correlations of age 15, with small number of cases, 
1/2, and the four remaining ages, 1 each. At the foot of the 
table is shown the average of the weighted average correlations of 
the respective columns. These latter figues show that C-1, 
Haggerty, and N.I.T. each correlate about .60 on the average 
with all the others, while Arith.-Re., weighted C—2, and half-year 
gains correlate less in descending order, respectively. Whereas 
Arith.-Re. does not correlate so highly (.56) with all the other 
intelligence measures, it correlates as highly with Haggerty (.72) 
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and N.I.T. (.69) as either Haggerty and N.I.T. correlates with 
any of the other variables save Haggerty with N.I.T. (.76). 
These correlations are all substantially about .70. It correlates 
.31 with ‘‘half-year gains,’’ whereas N.I.T. correlates only .33 
and Haggerty .43. 

These results are corroborated by a study of the correlations 
with ‘‘half-year gains’”’ and “‘average work,’’ two measures of 
school success in the case of age groups of all boys and girls on 
whom these data were available. The correlations were as 
follows: 


CORRELATION OF J.E.R. AritH.-RE. SCORE 
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CHAPTER III 


TESTS OF ABILITY WITH THINGS AND 
MECHANISMS: BOYS’ TESTS 


We have experimented in various ways with the following tests: 


. The Stenquist Assembly Test of Mechanical Ability, devised 
by Dr. J. L. Stenquist and made by Stoelting and Co., 
referred to hereafter as the Stenquist Assembly. 

. The I.E.R. Assembly Test for Girls. 

. The Stenquist Mechanical Aptitude Test I, and 

. The Stenquist Mechanical Aptitude Test II, referred to here- 
after as the Stenquist Picture Tests I and II. 

. The Thurstone Manual Training Information Test. 

. The Army General Trade Test. 

. The Army Mechanical Interest Test, referred to sometimes as 
the M.I.T. Test. 


= 


ND BRwWN 


Concerning the relative merits of these tests in various respects 
we ask whether they measure an ability or group of abilities that 
is distinct from general intelligence; how well they may be ex- 
pected to predict success in mechanical work; how reliable they 
are, and how they should be given and scored to obtain satis- 
factory results economically. 


THE INTERCORRELATIONS OF MECHANICAL TESTS 


In order to determine the intercorrelations among a number of 
so-called mechanical tests used by different research workers, the 
following six tests were administered to boys of the ages 12, 13, 14 
and 15, of School B: The Thurstone Manual Training Test 
(after deleting some twenty items bearing on machine shop 
practice), the Stenquist Assembly Test, the Stenquist Picture I 
and Picture II, the Army General Trade Test, and the Army 
Mechanical Interest Test. The weighted average of the inter- 
correlations of the four age groups was computed by weighting 
age 15 one-half (on account of the small number of cases) and 
ages 12, 13, 14, one each. It will be noted in the “Wtd. Av. r”’ 
columns of Table II that the Stenquist Assembly correlates 
highest with the Stenquist Picture I; Stenquist Picture I cor- 
relates most highly with Stenquist Picture II, but only to the 
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TABLE II 
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INTERCORRELATIONS BY AGE Groups OF MECHANICAL TESTS FOR 145 Boys 


OF PUBLIC SCHOOL B * 
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* The Probable Error of the Correlation Coefficients in Table II may be found from the 


following table: 
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extent of .56; General Trade Test correlates most highly with 
Mechanical Interest Test (knowledge of the use of tools), .50, and 
next highest with the Thurstone Manual Training Test (true-false 
questions about the use of teols and the properties of materials) ; 
the Thurstone Manual Training Test correlates highest with the 
General Trade Test (recall questions about general mechanical 
information), .40. The average of these weighted average corre- 
lations, in the bottom row of the table, shows that the average 
intercorrelation of each of these tests with each of the others is 
about .40, save in the case of the Thurstone Manual Training 
Test, which correlates somewhat lower. This average intercorre- 
lation of .40 for a mechanical test devised by various investigators 
is to be compared with the results of Table I which show that the 
average intercorrelation of a number of tests which are known to 
correlate rather well with general intelligence is about .60 for 
similar age groups of girls. This rather lower intercorrelation 
may indicate a greater confusion in the minds of the builders of 
mechanical tests in regard to what they are attempting to measure 
than in the case of the builders of intelligence tests; or, it may be 
merely the result of the lack of sufficient mechanical environment 
in the case of these boys to bring out their mechanical potentiali- 
ties; or lower intercorrelations of ‘‘mechanical”’ tests may be the 
essential nature of such tests. In various other tests it has been 
found that practice improves the correlation between functions. 
It seems quite likely, once these boys are subjected to a more 
complicated mechanical environment, that the size of the corre- 
lations between these mechanical tests will increase. The me- 
chanical environment of a New York City boy is quite limited 
compared to that of a country boy, or one living in a small town. 

In Table III, the averages and standard deviations on the 
Stenquist Picture I, Stenquist Picture II, and Stenquist Assembly 
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Test are compared for 19 fifteen-year-old New York City boys of 
Public School B, and 145 first-year high school boys of High 
School R. School R is located in a small-sized city noted for the 
excellence of its machine shop products; each boy in the eighth 
grade had taken prevocational work in mechanical courses. 
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BY I9 PuBLIC SCHOOL 15-YEAR-OLD NEw York City Boys 














Hicu ScHoor R, | P.S.B,N.Y.C., 
First-Yr. H. S. 15-YR.-OLD DIFFERENCE OF 
TEsT Boys Boys THE AVERAGES IN 


FAVoR oF HIGH 


ScHOOL Boys 
Aver. Sabs Aver. SEDs 











Sten Lictiak ie na. ee 18.7 36.5 10.6 20.7 
SLCey ECL l-reeee 52.0 11.8 2268 10.0 1827 
Sten. Assembly....| 71.2 16.8 52.4 19.9 18.8 














Such differences as shown surely cannot be due to chance. It 
is obviously impossible to state how much of the superiority of the 
small city first-year high school boys on all of the three tests is 
due to different native capacity, different specific training, 
specific selection, or richer mechanical environment. 

There are also available for comparison the intercorrelations of 
the General Trade Test, the Mechanical Interest Test, the Sten- 
quist Picture I and Picture II, and the Stenquist Assembly Test in 
the case of the R High School boys and the entire group of New 
York City public school boys. These intercorrelations are shown 
in Table IV, where the correlations for the New York City boys 
are given as the weighted average of the intercorrelations of the 
four age groups, I2, 13, 14 and I5. 

The intercorrelations of the R High School first-year pupils and 
the intercorrelations of the larger group of the same pupils in the 
preceding year in the eighth grade prevocational school, are 
higher for the most part than the corresponding intercorrelations 
of the several tests in the case of boys in New York City public 
schools. 

It is obviously impossible to tell how much of this difference is 
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TABLE IV 


AVERAGE INTERCORRELATIONS OF MECHANICAL TESTS FOR Four Groups* 





| TEST 


Prevocational..... 
High School..... 
Wes AGirls we onekoe 


GENERAL 
TRADE 


Prevocational.... 
High School...... 
We CeGirlsinad van 


M.LT. 


Prevocational.... 
High School...... 
i orm BOE Cyd ER ae 


STEN, II. 


Prevocational.... 
. High School...... 
Nikon! Gir lsiscnieny ot 


STEN STEN. I 
ComB, 
B ZARR ZARR ZARB ZAPR ZARE 

K 
QO 
Q 
Be 
a 


. Prevocational.... 
. High School...... 
BV CuGirls vaste 


STEN 
ASSEM. 


LE.R. 
ASSEM, 
Z2ZAR Z2Z2ZAR 4 ZR 
< 
@) 
@ 


. Prevocational.... 
High School...... 
6 Mat Ga lea bien uoed 


INTELL, 


* With a variable number of cases in the case of the New York City boys and girls. 
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.70 
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-50 
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44 
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-50 
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STEN. | STEN. 
II ComB. 
.24 
+33 
-50 
-46 
-59 
56 
+45 
.36 -42 

.50 
E32 
«57 











STEN. 
ASSEM. 


city) 
+33 
.24 
-41 
.40 
.42 
“45 


.36 


.42 


-42 
+53 
.10 
.18 
-18 











LE.R. 
AsseM. | N 
208 
145 
.50 
-42 
+53 
Boe 
212 
The P.E., 


can be computed for the R Prevocational Group and R High School Group by means of the 


following table: 





P.E., WHEN r Is OF THE VALUE 





N 

0.0 = .I x 2 = 3 +14 = .5 + .6 
145 .06 06 .05 .05 .05 .04 .04 
208 .05 0S -05 04 04 04 .03 

















7 + 8 =.0 
-03 02 or 
-O2 02 or 
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due to difference in range of ability involved and how much to 
other causes. Practically all of the intercorrelations are in the 
neighborhood of .25 to .50. This gives some hope that a com- 
bination of these tests may predict mechanical ability considerably 
better than any single one or two of them. Indeed, the low re- 
liability of the Stenquist (r11=.60 in grade groups) indicates that 
in order to have a highly reliable mechanical scale we must either 
lengthen the time limits on the present Stenquist Test or else add 
to it tests of other abilities which are important in predicting 
mechanical aptitude. As a general policy, it is usually more 
promising to add other tests with different content than to 
lengthen the present test; this is especially true if the second test 
measures a unique but important element of mechanical aptitude. 
Such a test can be recognized, first, by its high correlation with 
an adequate mechanical criterion, and second, by its low corre- 
lation with the present test or combination of tests. 

When an adequate criterion is available, it can readily be 
determined whether the addition of any other test or the lengthen- 
ing of the present scale to twice its present length will yield the 
higher multiple correlation coefficient of the revised combination 
with the criterion. 

In this table is shown also, the average correlations for the New 
York City girls of the Stenquist Assembly with the I.E.R. Girls 
Assembly tests (.42). The average correlations between a 
measure of intelligence and the several tests is alsoshown. These 
are for the most part quite low, all being about .30 or below save 
in the case of the combined Stenquist Picture and the General 
Trade Tests. 


CORRELATION OF MECHANICAL INTEREST TEST AND GENERAL 
TRADE TEST 


Through the codperation of Dr. H. S. Hollingworth, of Colum- 
bia University, the Mechanical Interest Test and General Trade 
Test were administered to 31 Columbia students who took both 
forms. The correlation between General Trade and M.I.T. is 
885.03. This correlation is much higher than ordinarily is 
obtained between the General Trade and M.I.T. in various other 
groups which have been tested by these tests. This points to a 
possible hypothesis that mechanical tests for these academic 
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students in all probability function more as intelligence tests than 
as mechanical tests. The correlation of .885 in a university group 
indicates a very high relationship. The average score on the 
M.1.T. was 60 points and on the General Trade 56 points. These 
averages are to be compared with the corresponding averages in 
the case of the Camp Grant soldiers, which show that the 
university students make higher averages on both tests than do 
the soldiers; the greater difference in favor of the university 
students being in the case of the General Trade Test (mechanical 
information), a test which is known to correlate slightly higher 
(in the case of soldiers) with intelligence than does the M.I.T. 
(knowledge of the use of tools). 














VIPs GEN. TR. N 
Camp Grant M oo As ctu vid n ness 51 39 240 
Misiepis Sac dw iye Cone 09) 21 
Columbia University My.......... 60 56 31 
US eo bon coor 19 26 


THE CORRELATIONS BETWEEN TESTS OF ABILITY WITH THINGS 
AND MECHANISMS AND TESTS OF ABILITY WITH IDEAS AND 
SYMBOLS 


Taking boys and girls as we find them in the public schools, 
‘ability with things”’ is notably distinct from ‘‘ability with ideas 
and symbols.’”’ How distinct these abilities are in the original 
inborn constitution of man we do not know. They may become 
more and more divorced by circumstances of life which give 
certain individuals much practice with things and little with ideas 
and symbols and vice versa. As things are, however, we get 
information about a new and large fraction of human ability when 
we add such a score as that in the Stenquist Assembly or the 
I.E.R. Assembly to a pupil’s school record and score in intelli- 
gence tests. 

To the existing evidence for the distinctness of these abilities 
our investigation adds the following: 

In the general summary table of intercorrelations (Table V) we 
show, along with many other facts, results for 435 boys and 318 
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girls in the case of Half-Year Gains in School, Average Conduct 
in School, and Average Work in School, as related to score made 
in the Assembly Tests and to score made in the Arith.-Re. Test, 
and also the relation between the Assembly Test score and the 
Arith.-Re. Test score. The Assembly Test ability is clearly 
differentiated, especially in boys. Most of the correlations are 
below .25. 

With the assistance of Dr. L. J. O'Rourke, the Stenquist As- 
sembly Test was given to 145 high school boys in City R who had 
been tested with Army Alpha a year previously when in pre- 
vocational classes in the eighth grade. The correlation between 
Stenquist Assembly and Alpha was only .14+.05. 

Through the kindness of Dr. H. A. Ruger, we have records for 
82 adults in a three-hour intelligence test and also in the Stenquist 
Assembly Test. The average correlation is only .24 for men and 
.13 for women, as shown in Table VI. 


TABLE VI 


THE CORRELATION BETWEEN ABILITY WITH THINGS AND ABILITY WITH 
IDEAS: CORRELATIONS OF THORNDIKE COLLEGE ENTRANCE INTELLI- 
GENCE TEST AND STENQUIST ASSEMBLY TEST 


A. UNIVERSITY WINTER GROUP 
































: STENQUIST 

Group r 1D N 
Av. o 
Bothtsexestaic- er eer .18 +.10 41 60 21 
WViOmene sion ea ee rane 15 +12 28 53 20 
Mentrerciieict Pen oer 41 + .16 13 74 14 

B. UNIVERSITY SUMMER SESSION GROUP 

STENQUIST 

Group r PE. N 
Av. C 
Bothisexes. cr cine rerer 06 =O 4I 71 29 
Women. seamen erneee e Suni =a 25 56 19 
Mens Fo nh as SO ae .06 +.17 16 62 25 
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THE PREDICTION OF SHOP RANKS FROM THE STENQUIST ASSEMBLY, 
THE STENQUIST PictuRE I AND II, AND INTELLIGENCE, WHEN 
THE CRITERION CORRELATIONS ARE ESTIMATED 


At the beginning of the experiments with mechanical tests 
conducted by the Institute, it became desirable to form some 
conclusions as to whether or not a weighted composite of certain 
other so-called mechanical tests would predict shop ranks equally 
as well as the Stenquist Mechanical Assembly Tests. If, for 
instance, three paper tests properly weighted would correlate as 
well with shop ranks as does the Stenquist Mechanical Test, it 
would be useless to give the longer, more expensive assembly test 
unless the people affected for practical diagnosis by the ratings 
therefrom were markedly different people in the two cases re- 
spectively. We should emphasize this last point for it is theoret- 
ically possible that, with a correlation with shop ranks of .71 or 
less in the case of each of two different tests respectively, a 
““senius’’ on the one test may be rated “idiot” on the other test 
and vice versa. 

At the time there were available only fragmentary ones of the 
correlations needed, these being found in the work to be published 
in the doctorate dissertation of J. L. Stenquist. The intercorre- 
lations of the various tests had not been obtained consistently on 
any one group, although they had been obtained in certain 
instances for practically all of the tests on various groups. 

In the absence of the exact correlations it was determined to 
resort to an approximation by having three judges estimate the 
desired correlation coefficients after having carefully read Dr. 
Stenquist’s manuscript copy of the report of his work. These 
correlations were estimated by Dr. Thorndike, Dr. Stenquist, and 
Dr. Toops, and appear in Table VII. The fifth figure in each 
compartment of this table represents the average intercorrelations 
later found actually to exist in age groups 12 to 15 inclusive. The 
intercorrelations of the tests are quite low with respect to the 
criterion correlations of the first row of the table, since the 
criterion correlations are probably too high for an age group and 
are estimated rather for the population at large. 
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TABLE VII 


SUMMARY OF INTERCORRELATIONS OF TESTS AS ESTIMATED 
BY THREE JUDGES 








STEN- STEN- STEN- 


TEL- 
JUDGE SHOP QUIST | QUIST In QUIST 


RANKS Pier. I | Pict. I | PIGENCF| Assem. 








Py] We Sewer cee atin ae nese 55 -55 45 75 
SCN sito aa a soc ota: 60 .60 45 65 
os S RO Aw atten. batter 62 .64 23 76 

Compromise. ....... 60 .60 45 75 

Th 55 -85 55 75 
Daa es eae 60 .60 40 60 
Shy ell) Uk Somoirh mie tatee So8e 62 .78 52 70 
fs Ee Compromise........ 60 -75 55 70 
n 

Observed Average 56 55 42 

OW iennencrnth en oleate pe 55 .85 55 65 
=| St fo she Ane 60 .60 e .65 .50 
Og 4 oy nee Sie or camo a 64 278 oe 64 .66 
Bn (Compromise 2a asae: .60 75 a 65 65 
5 AY 

Observed Average. .. Ae .56 aye .60 .36 
a PDS Re este cconsererenets 45 55 55 45 
fa Stsrapei shes Coe cena 45 .40 65 30 
Z MO ietechtonacen spe 23 .52 64 oe 25 
i Compromisenarerr 45 55 .65 Le 45 
Z 
2) Observed Average... = 55 .60 sie .18 
S eDheae eatin ac err Hs 75 65 45 
CREASE eo hgcee San Reo 65 .60 .50 .30 
aD ved Onna ce 76 .70 66 25) 
ia 2 Compromise........ 7S .70 65 45 
ax 


Observed Average... A .42 .36 .18 





By the method of multiple ratio correlation, Stenquist Picture 
I and Picture II and Intelligence were combined to predict as 
best they would the shop ranks in the case, respectively, of the 
estimates of Drs. Thorndike, Stenquist and Toops, and also in the 
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compromise or modal correlation system, and finally in the case 
of the true age intercorrelations using the criterion correlations of 
the compromise estimate. The results are shown in Table VIII. 
The multiple ratio correlation coefficient for the composite of 
three paper tests (Column f) was less than the correlation of 
Stenquist Assembly with the criterion in all cases save in the case 
of Dr. Stenquist’s estimates. The material for computing the 
true intercorrelations of tests was not available previous to the 
time of writing this report, but substantiate the results found by 
the estimated intercorrelations, provided it be granted that the 
criterion correlations are of the proper relative magnitudes. 

We may conclude then, so far as the evidence goes, that the 
composite of the three paper tests is somewhat inferior in terms of 
the multiple ratio correlation coefficient to the Stenquist Mechani- 
cal Assembly Test alone. The final test of this fact will, of course, 
come when a valid mechanical criterion becomes available and all 
the tests are administered to a large group of subjects. We thus 
are reasonably certain that the Stenquist Assembly Test is to date 
the most important single test contribution to the measurement of 
general mechanical ability. Even if the three-test composite 
correlated as highly with shop ranks as does the Stenquist 
Assembly Test, we might yet be justified in selecting the Stenquist 
Test in preference to the three-test composite for the reason that 
Stenquist Test correlates low with intelligence, whereas in age 
groups the combined Picture Test alone correlates in the neighbor- 
hood of .60 with intelligence. Thus the “‘failures’’ on the intelli- 
gence test will by no means necessarily be the failures on the 
Stenquist Assembly Test, although the failures on the intelli- 
gence test will for the most part be the failures on the com- 
bined Stenquist Picture Tests because of the high correlation 
between the latter two. 

A composite of the Stenquist Assembly Test with such other 
mechanical tests as we now have available (Column g) will prob- 
ably predict shop ranks considerably better than the Stenquist 
test alone (Column d). This is especially true if we are trying to 
predict rates of learning a mechanical process, or length of time 
needed to acquire a given amount of trade proficiency. 

With the assumption of the intercorrelations involved in the 
true age intercorrelation group, when Picture II, Picture I, and 
Intelligence in turn, in order of decreasing amounts of contribution 
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to the multiple ratio correlation, are added to the Assembly Test 
to make up a composite scale, the multiple ratio correlation co- 
efficient is .85. In this case the relative weights (8), which are to 
be divided by their standard deviations in order to determine the 
gross score weights, are as follows: 


TEST 8 WEIGHT 
SENSO TID Nc eld net Se Oar ien sires an a ee 1.000 
PicrOceyy lee Wired Ah eg Ne iA re SO 618 
PELCULIUGY Mee ergh eee eT NF ci s9e oe oRoy Sikes alias Paes 359 
Ae eM Ce ae Mr Weht ORs aa Samet. es Noh he's 128 


The added practical effectiveness of a scale which would cor- 
relate .85 with shop ranks, as compared with the Stenquist 
Assembly Test which correlates .75 with shop ranks as given in 
the table, is well worth the effort required to obtain it. The 
standard error of estimate of the composite scale is then (r =.85) 
only .53 of the standard deviation of shop ranks. 

As a matter of fact, the universally low intercorrelations among 
the so-called mechanical tests used in the past invites our atten- 
tion to the desirability of combining many of these into a com- 
posite scale in order to predict mechanical ability. Mechanical 
ability is probably quite as ‘“‘general’’ as general intelligence. On 
the other hand, we have shown in this investigation that general 
intelligence can be adequately measured by two or more of the 
standard school tests provided we are primarily interested in the 
aspect of the correlation of the test with a criterion of ability to 
progress in school. In the same way it may be found that one or 
two rather basic mechanical tests will give practically as good 
prediction of mechanical capacity as the two educational tests do 
in predicting school progress ability. This seems reasonable 
from the fact that, so far as tried out, none of the paper mechani- 
cal tests give very high correlations with an adequate criterion. 
A composite scale requires tests which shall correlate low among 
themselves relative to the correlations with the criterion. The 
current mechanical tests fulfill the first of these requirements, 
low correlations among themselves, but probably do not fulfill so 
well the second, high correlations with a criterion. 

Two of the promising types of mechanical tests for such a 
combination would seem to be an assembly test without a model, 
the product not being predetermined, such a test as is found in 
the Stenquist Assembly Test, and a test of imitating a model, 
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such as the I.E.R. Girls Assembly Test. These tests correlate 
only about .40 with each other in the case of age groups, while, 
presumably, both correlate fairly well with an adequate criterion. 
More adequate predictions than are now possible can readily be 
secured by multiplying the length of any of the forms of the 
present mechanical tests. The increase in reliability of the 
Stenquist test to be obtained by doubling the number of test 
models has been mentioned in another place. The reliability of 
the I.E.R. Assembly is not known at the present time, but this 
test likewise will increase both in reliability and validity if given 
over a longer period of time with more models. In all test work 
the implicit assumption is that the few tasks which the subjects 
attempt in the short period of time are, if the test is a good test, 
a good sampling of the sum total of reactions of which the in- 
dividual’ is capable, called his ‘‘ability.”” If fifteen minutes’ 
sampling of such abilities is but a poor sampling of the total of 
such abilities, we have but to increase the test to thirty minutes, 
forty-five minutes, one day, three days, three months, three years, 
or what not, or until we have had the individual actually perform 
all of the reactions of which he is capable, whereupon, theoreti- 
cally, we would have a perfect measure of the ability which we are 
trying to measure. The good test is that one which will corre- 
late highly with the sum total of such abilities when the test is 
administered -in a short, that is “ practical,’’ amount of time and 
with a minimum of scoring and other administrative labor. This 
criterion is not a practical one which can be statistically approxi- 
mated; hence we must rely on securing an adequate criterion. 
The perfect test of mechanical ability would be one which would 
cover many weeks, the individual being required in every suc- 
cessive hour of the time to do new mechanical tasks and to be 
objectively scored upon each task in turn. Stenquist has at- 
tempted to abbreviate this process to one-half hour. It is the 
writer’s conviction, based upon known testing principles, that the 
time should be at least doubled. And even two hours is little 
enough time for a person to employ in determining his mechanical 
capacity. A test of the Stenquist type which would involve four 
times as many items would by no means require four times as long 
to score. When given in groups, the extra time per person 
required for administration of the test is negligible; the only 
increases worth considering at any length are the increases in time 
and labor required in the scoring, and the cost of test materials. 
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THE PREDICTION OF SUCCESS IN PREVOCATIONAL COURSES IN 
CARPENTRY, PRINTING, FORGING, AND THE LIKE, BY VARIOUS 
TESTS 


During the winter of 1920 and 1921, a preliminary investiga- 
tion of the talents possessed by 208 prevocational boys in the 
Public School R, eighth grade, was conducted by Dr. Toops and 
Dr. O’Rourke. The army form of the General Trade Test, the 
Mechanical Interest Test, and the Army Alpha were given to all 
boys. In addition, their ages to the nearest tenth part of a year 
were determined and used in the intercorrelations. Asa criterion 
of proficiency in prevocational courses, teachers’ school marks 
were used. These undoubtedly took into account more of the 
factor of intelligence than would a longer course in mechanical 
work, since the instructor had each boy but a few periods. This 
preliminary investigation showed that the General Trade and the 
Mechanical Interest tests correlate highly enough with this 
criterion, compared to the Army Alpha, to justify the administra- 
tion to those of the original eighth grade group who had entered 
high school, the additional tests of Stenquist Picture Test I and 
II, and the Stenquist Assembly Test. Of the original 208 boys it 
was possible to test 145 who had entered high school. The re- 
tests were given by Dr. L. J. O’Rourke, assisted by the Institute 
of Educational Research. 

It is probably true that between the period of the eighth grade 
and entering high school, the elimination was selective in both 
intelligence and mechanical ability. A report of the significant 
relationships found in the original investigations and in the re- 
tests is given on the following pages.! 

Each student selected ordinarily two and generally three 
courses from the following list which he pursued for eight weeks, 
six hours each week: Electrical, Carpentry, Sheet-metal, Foundry, 
Printing, Forge, Machine-shop, Pattern-making. 

The Preliminary Investigation Conducted by Toops and O’ Rourke, 
1920-1921. In this investigation 208 cases were available. As 
a criterion of mechanical ability each subject’s school marks on 
the courses completed were averaged. Corrections for differences 
in the average marks of different trade courses were made. This 
assumes that average students in each class had equal mechanical 
abilities, an assumption probably only approximately true. 


1 Prepared by Dr. Herbert A. Toops from data collected by himself and Dr. L. J. 
O’Rourke. 
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The intercorrelations are shown in Table IX: 


TABLE IX 


INTERCORRELATIONS OF TESTS OF 208 PREVOCATIONAL Boys* 








MECHAN- 
VARIABLE CRITE- ay RAL ICAL ALPHA AGE 
aa BADE INTEREST 
Criterionee ere : 41 33 .19 — II 
General Trade... 41 x .70 .42 Ait73 
Mechanical In- 

COPESE cree ain oe 33 70 30 10 
INV TAIc go coud o0 6< .19 42 30 — .26 
ING Crammer mer —.II 12 10 — .26 

“ ess |S ee pe 
o of Variable.... Tiny, Se) 14.7 20.5 1.04 
* P.E., when N =208 cases: 
r= Oe E S54 EES e540 25-155 Ea, 255) =o) 
P.E., = .05 .05 .05 .04 .04 -04 -03 .02 .02 


The two mechanical tests correlate .70+.02 with each other; 
they correlate low with intelligence: Army Alpha and General 
Trade, r=.42+.04; Mechanical Interest and Army Alpha, r= 
.30+.04. Age correlates positively, but low, with both mechani- 
cal tests. All facts seem to show that mechanical ability is more 
dependent upon age than is school ability or intelligence. In the 
above table, age correlates negatively with Army Alpha, as is 
usually the case in a school grade group. 

The multiple ratio regression equation for combining the four 
variables for predicting the criterion is: 

Xr 99 HGen. te 4 2487 XMLT 4 .0426 Xatona _ 4go8 OM, 
Oo] OGen.Tr. OM.I.T. O Alpha T Age 


The accumulating test composite correlates with the criterion 
to the extent: 


ACCUMULATING TEST COMPOSITE 


Yrcr 
Generalliradeaione=rra eran .412+ .04 
Gensehis- MET haere ayer 416 .04 
Gen. Tr.-+- M.I.T.+Alpha........... 417+ .04 


Gen. Tr.+M.I.T.+Alpha+Age....... .446-+.04 
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The Follow-up Investigation of Boys Who Entered High 
School: There were 145 cases of the retest group in the High 
School who had complete scores on Stenquist I, Stenquist II, 
Stenquist Assembly, Age, M.I.T., General Trade. The computed 
correlations were as follows: 








N=145 
CORRELATION Sa SENS STENOQUIST GENERAL 
BETWEEN ae aie 2 ASSEMBLY oe ee TRADE 
BNIOIGSE oia.g Stung .08 + .06].33 + .05].14+ .06]— .22+.05].30+.05].23+.05 


Intelligence, measured one year, is but little related to mechani- 
cal ability measured the following year. The correlations on the 
retest group with other variables the year previous are substan- 
tially the same as in the case of the 208 subjects. 

Dr. O’Rourke then selected 100 cases of the retest group and 
correlated them with the old criterion in their actual scores, that 
is, without grouping the gross scores. (There are thus 14 classes 
instead of the original § classes.) The correlations are: 

















N=100 
3 4 5 i 8 9 
AE ANOSG M.1.T. GEN. TR. ALPHA Stren. I STEN. II SrEN. 
ASSEMBLY 
Correlation with 
Criterion. ....}.02-+.07|.07+.07|.20+ .06].02+.07}.10+.07| .00+.07 





There was no relationship in evidence in the case of any of the 
variables. These results led him to believe that the criterion 
must be unreliable. Accordingly, he computed the following 
correlations of prevocational school marks with a composite of 
2 General Trade gross scores+ Army Alpha gross scores (weight- 
ing General Trade about 14 times as much as Army Alpha when 
o’s are considered). 
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Siieetmetalen.+ eine atoms oak + .08 59 
POUL oe eee eee 13 + .08 72 
[PIRI Sites ENPRUE NS Eur reOES pe A Ore .O4 + .08 68 
OU Pie hs keds ame Chie cueec ae .09 + .09 60 
MacIUUSE se: eine erie a aie — .07 + .08 68 
Patterninakilig eee ore eee m7, elo 44 





He accordingly left machinist and printing out of the old 
criterion, calling it a revised criterion, and took into account the 
o’s of the several scores, weighting each of a subject’s mechanical 
marks equally. By correlating the tests with the revised criterion 
corrected for range (o’s and averages taken into account), 
printing and machinist out, the following 7’s result: 














N=98 cases 
CORRELATION WITH 

P.E. 

TEST REVISED CRITERION fi 

General“ Trademcn Seca cee eee BE + .06 
Army Al pian sac cornea see te ne 15 en O7, 
Stenquistel er wee ee eee .00 22 0y; 
Stenquist lle. n ec pee eee ee: sit =. 07 
Stenquisty Asse ply eee eee eee .02 + .07 
MEST Se vate eras pierce eee eae .16 = .07 








We might conclude from this experiment that in this pre- 
vocational school intelligence is more important in the mind of the 
instructor when grading his pupils than is the type of ability 
measured by the Stenquist Assembly Test. This varies from 
course to course where different instructors are involved. Thus 
we have the customary high unreliability of teachers’ marks in 
academic subjects made more unreliable in these vocational 
courses by reason of the instructors having only a few contacts 
with the pupil before the final mark is given. 

The significant conclusion for vocational guidance would seem 
to be that such prevocational courses, while perhaps of inesti- 
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mable worth in teaching the pupil a few fundamental facts about 
the trade and giving him a basis for interest in the trade, are of no 
value for testing purposes if the instructor makes the rating on 
such casual relationship and in the ordinary manner in which 
school marks are made. It does not necessarily follow that 
objective methods of rating the proficiency of boys in such pre- 
vocational courses cannot be made. In fact, there is every 
reason to believe that such objective rating can readily be made 
by an instructor who is conversant with the principles of mental 
testing, and that such ratings would be of great worth in guiding 
pupils into occupations in which they have a high chance of 
success. 

Toops ! has shown that a similarly conducted bureau of voca- 
tional guidance, which has the pupils in its charge for two weeks, 
is able to make subjective ratings which are of worth and which 
would be of very much more worth provided one used more 
objective methods of scoring and better statistical methods of 
evaluating the test results. 

The re-tests do bear out the previous findings of low correlations 
of intelligence with other mechanical measures, especially with the 
manipulative type of ability required in the Stenquist Mechanical 
Assembly Test. It also bears out the inference that improvement 
in one’s mechanical status is more dependent upon age than is 
improvement in intelligence or in school work. 

From the point of view of testing technique, this experiment 
emphasizes the necessity of carefully planning in advance one’s 
complete testing program and especially assuring himself in 
advance that the criterion against which he hopes to measure the 
validity of his test, is really reliable and measures what it purports 
to measure. School marks, which have little variability, will 
probably be unreliable as a criterion. 


RELATIVE VALIDITY OF MECHANICAL INTEREST TESTS AND THE 
GENERAL TRADE TEST IN PREDICTING TEACHERS’ ESTIMATES 
oF POTENTIAL ABILITY IN THE TRADE COURSES 


The automotive, electrician, machinist and bookkeeping groups 
of students at the Camp Grant summer school of 1920 were rated 
by their instructors for potential ability in the respective courses. 


1Toops, H. A. Trade Tests in Education, pp. 76-95. Teachers College Con- 
tributions to Education, No. 115. 
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Since the General Trade test is known, by its correlations, to 
depend more upon intelligence than does the Mechanical Interest 
test, it becomes desirable to obtain some knowledge of the validity 
of the two in predicting mechanical ability. The evidence which 
is available is shown in Table X below. The bold face figure in 
each compartment is the correlation of the particular test in the 


TABLE X 


THE CORRELATIONS OF GENERAL TRADE TEST AND M.I.T. witH INSTRUCTORS’ 
ESTIMATES OF POTENTIAL ABILITY OF STUDENTS IN Four MECHANICAL 


























COURSES 
C CORRELATION 
CORRELATION onan, CORRELATION | OF POTENTIAL 
COURSE OF POTENTIAL a hl take oe or M.I.T. ABILITY AND 
TAKEN ABILITY AND = ee i. AND GENERAL | 140-QUESTION 
IV ieoales pale as oe re TRADE Test | SET OF THE 
BN Bere GEN. TR. TEST 
Automotive....| .05+.09 .19+ .08 157 = .05 26+ .08 
61 65 87 65 
Electrician....| .50+.09 .51+ .08 .68+ .06 53+ .08 
34 35 34 36 
Machinist.....] .43+.10 46+ .11 73+ .07 AT+.II 
24 23 23 23 
Bookkeeping. . . — .01+ .12 02+ .13 .70+ .06 — .02+.13 
33 28 37 28 











column with the teachers’ estimates of potential ability in the 
course shown in the row respectively to which the correlation 
coefficient belongs. Thus the Mechanical Interest test correlates 
with estimates of potential ability in the automotive course to the 
extent of .o5+.09, and the same test correlates with estimates of 
potential ability in the electrical course to the extent of .50+.09. 
The small figure in the lower right-hand corner in each compart- 
ment refers to the number of cases on which the correlation 
coefficient is based. These numbers of cases vary from corre- 
lation coefficient to correlation coefficient because of incomplete- 
ness in the records of tests given at the time of entrance to the 
courses. These tests were all given before the students began the 
courses and consequently are to be looked upon as being true 
measures of the efficiency of the respective tests in giving guid- 
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ance. The automotive courses were taught by several instructors 
and consequently the ratings are necessarily attenuated. The 
General Trade test is the only test which promises to be of value 
in this course. In the electrical and machinist courses, both the 
Mechanical Interest test and the General Trade test correlate with 
potential ability about .50 and .45 for the two courses respectively. 
Neither test predicts bookkeeping ability. The correlations be- 
tween the Mechanical Interest test and the General Trade test, in 
the various courses, are about .70, which is considerably higher 
than in the case of age groups of New York City boys. 

It will be noted that these boys are boys who are for the most 
part of considerable mechanical inclination and mechanical ex- 
perience and have enough interest in mechanical things to have 
at least enrolled for mechanical courses. The corresponding 
intercorrelation of these two tests in the case of the public school 
R prevocational boys is likewise .70; these boys live in a small city 
with varied mechanical environment, and are very much more 
self-reliant mechanically than New York boys. The intercorre- 
lation of the two tests in the case of the New York City public 
school boys who have a limited mechanical environment and are 
not taking any mechanical courses, averages .50 for age groups. 
It seems likely that, if ranges in ability were equated, the New 
York City boys would still correlate less highly between these two 
tests than either of the other two groups mentioned above. This 
fact lends support to the belief that practice in mechanical 
abilities, whether of the paper or of the actual manipulatory 
variety, increases the correlations between the functions tested. 

The fourth column of the table shows the correlations of the 
selected 140-question set of the General Trade test, thousands of 
copies of which were administered to soldiers in the Army E and 
R schools during the winter of 1920 and 1921, with the esti- 
mates of potential ability. This revised set of the General 
Trade test has correlations which are almost identical with the 
longer (204-question) set. 


THE INTERCORRELATIONS OF VARIABLES BEARING ON 
PROFICIENCY IN THE MACHINISTS’ COURSE 
At the completion of the six weeks’ course in machine shop 
practice at the Camp Grant summer school in 1920, 24 machine 
shop students rated each other in regard to their potential 
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(ultimate) mechanical ability, and were rated also by their 
teachers. In addition, the school marks given by the instructors, 
and results cf a one-word-answer trade test, or objective 
measure of final proficiency, were available. 

The students rated only those students whom they knew well 
by picking out the best man, whom they called “best,’’ the second 
best man, whom they called ‘‘good,’’ the worst man, whom they 
called ‘‘poorest,’”” and the second poorest man, whom they called 
“pnoor.’’ By an arbitrary procedure, these ratings were combined 
into a final average proficiency rating, it being assumed that any 
man who was not rated at all by some one of the 24 students was 
“‘average.’’ The school marks had been recorded by the in- 
structors on graphical charts in which each of the number of 
operations into which machine shop was divided had a graphical 
record of the amount of progress so far attained in each operation 
indicated by the length of line drawn; the summation in inches of 
the total length of line drawn for each individual was taken as the 
school mark. 

The teachers’ estimates of potential ability were given on a 
scale from I to 5 after the standard method used in the army, in 
which the percentage of people distributed to these five steps is 
taken to be such that a normal distribution of ability will result. 
The trade test score of final proficiency was a set of one-word- 
answer questions, bearing upon machine shop practice which the 
students presumably had had opportunity to acquire during 
their course. 

In addition, the scores on the Mechanical Interest test, the 204- 
question General Trade test and the revised 140-question General 
Trade test were available. The intercorrelations of all these tests 
by the rank difference method are shown in Table XI. 

It is interesting to note that although the students’ estimates of 
potential ability correlate only .55 with the teachers’ estimates of 
potential ability, and although they correlate with school marks 
only to the extent of .63, whereas the teachers’ estimates of 
potential ability correlate with school marks to the extent of .88, 
the students’ estimates of potential ability correlate higher, .58 + 
.09, with trade test measure of final proficiency at the end of the 
six weeks’ course than do the teachers’ estimates of potential 
ability, .53+.10. The Mechanical Interest test and the General 
Trade test in both forms correlate in the neighborhood of .45 
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TABLE XI 


INTERCORRELATIONS BY THE RANK DIFFERENCE METHOD OF VARIABLES 
IN THE MACHINIST COURSE 
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with teachers’ estimates of potential ability. It undoubtedly is 
true that students are able to learn facts of worth regarding their 
fellow-workmen’s trade capacities that are unobserved by the 
teacher. It is likely that a combination of the students’ esti- 
mates with the teachers’ estimates would be a more reliable 
measure of trade ability than either alone. 


CHAPTER SV 


TESTS OF ABILITY WITH THINGS AND MECHANISMS: 
GIRLS TESTS 


Many investigations of school progress involving females have 
been carried out from which the general conclusion has been 
reached that sex differences may be neglected for the most part. 
Indeed, separate norms for the sexes have seldom even been 
compiled, a fact which shows the unimportance of sex distinctions 
in academic school work. 

Practically nothing has hitherto been done in investigating 
female mechanical ability. By popular opinion, women are 
credited with much less mechanical ingenuity than men. Obser- 
vation of the mechanical environment of the average woman 
might readily lead one to believe that, whatever may be the facts 
in regard to her innate mechanical capacity, the average woman 
must surely have failed to develop it up to a present working 
ability on a par with mechanical ability of the average man. 

It is not essential that the two sexes be tested on the same tests, 
for sex is a primary classification, of perfect reliability if deter- 
mined and recorded at the time of the examination, and of high 
reliability if determined from the names alone. If marked sex 
differences do occur, it would be better to use different mechanical 
tests for the sexes because of the higher validity which can thus be 
secured; if the sex differences in reality are not marked, then a 
tryout of tests, made with the aim of differentiating between the 
sexes, would reveal the lack of significant sex differences, where- 
upon one could discard the plan of different tests for the sexes. 
Without, then, any presuppositions in regard to the different 
innate or acquired mechanical differences of the sexes, it was 
decided to construct a girls’ mechanical assembly test which 
should aim to duplicate for girls the test situations afforded boys 
in the Stenquist Mechanical Assembly Test. Tests of girls in the 
meanwhile had shown that girls do poorly on the Stenquist 
Assembly Test. It seemed desirable to base the tests on ability 
to construct a model which is present at the time of the test. 
Twelve models were finally selected from a much larger number 
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originally considered. The selection of the twelve was made on 
the basis of scoring objectivity, time required, and whether the 
test seemed from a priori reasoning to be a component of general 
mechanical ability. 


THE PRELIMINARY TRY-OUT OF THE I.E.R. ASSEMBLY TEST 


After forty sets of the twelve tests had been assembled, they 
were given a preliminary try-out on 38 members of a 6AI class, or 
““bright”’ section of the second semester of a sixth grade class. 
The examiners kept careful watch at the time of beginning each 
of the separate test elements, and recorded the cumulative time 
as soon as five pupils had begun a given element. In this way, it 
was observed that threading three needles, the first test of the set, 
required too long a time limit, Io minutes in the case of many of 
the pupils and about 6 minutes for the average pupil of the grade, 
their average age being 10.7 years. The other elements required 
on the average about 4 minutes each before 5 pupils of the 38 
began them; from these observations, it was decided that about 
45 minutes was a sufficient over-all time for eleven test elements, 
for it was apparent that threading needles required too much time 
to be a practical test. 

All pupils were stopped at the end of 86 minutes even if they 
had not finished; however, about 45 per cent of the class had 
finished by the end of 80 minutes. 

The order of difficulty, from easiest to hardest, of the twelve 
test elements is as follows: 


OrIG- TotTaL CREDITS PER CENT 
INAL FINAL TEST ELEMENT EARNED BY 38 6AI | OF POSSIBLE 
LETTER | LETTER PUPILS PER ELEMENT | 380 CREDITS 
C B Inserting Tape.... 320 84 
B out Needle Threading. . 269 71 
A A Stringing Beads.... 258 68 
D G Rosettes c0)5.0- 244 64 
F D Gross Stitcher ae PG 57 
I E IRM ARTES bo po boc 160 42 
E F Clipr Chainer sera. 132 35 
H K Trimming Paper... 101 27 
K G Tape Sewing...... 100 26 
J H eDrtnkewl ag rcuarate- 87 23 
I I Card Wrapping.... 57 15 
G J IBookleten. accuse 48 13 
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‘Threading needles” was eliminated from further consideration 
because of the time required. ‘‘Trimming paper”’ was shifted to 
the last position as K of the revised series. The positions of all 
the other tests were assigned to them on the basis of the above 
percentages, with the exception that ‘‘stringing beads” and 
“inserting tape’’ were reversed, in order to place “stringing 
beads” as the first test even though it is slightly more difficult 
than “‘inserting tape.” This was done as it was felt that the box, 
which is the only variation from the envelope containers of the 
other models, could most readily be explained if given first. 
Since, also, every girl has at some time in her life strung beads, it 
was a task which would appeal to most of them. The principal 
of the school took the test with the pupils and made 107 out of a 
possible 110 points on the final selection of eleven elements. 

The tredits earned on each of the eleven elements, A to K, of 
the final series as adopted for the final scale, by various grade 
groups tested later are given in Table XII. C, “rosette,” is 
somewhat easier than its position would indicate, and its position 
might be shifted from third place to second by exchanging places 
with ‘‘tape inserting.’’ Similarly, “paper trimming”’ is somewhat 
easier than its position would indicate, and might be placed just 
before J, the booklet. It was subsequently placed as K for the 
reason that there was difficulty in the subjects knowing what was 
required, a difficulty which it is believed will be partially obviated 
by the present revised printed form. The differences in difficulty 
are so slight that, for the present, it is scarcely worth the while to 
make these changes. The order of difficulty of the test elements 
is substantially the order in which they are placed in the scale. 


RESULTS FROM TESTS OF Boys ON THE I.E.R. GrrRLs’ 
ASSEMBLY TEST 


The I.E.R. Girls’ Assembly Test was administered to 30 boys in 
the 7A grade of public school B. As shown by Table XIII, these 
boys have slightly higher averages in the Stenquist Assembly, 
Arithmetic-Reading intelligence combination, combined Stenquist 
Picture tests than of 13-year-old boys in general. As a group 
they are thus more than equal to the average 13-year-old boy in 
all these tests. When compared with the norm of the 13-year-old 
girls on the I.E.R. Girls’ Assembly Test, the average of the boys’ 
scores, 38.2, is about a 35-percentile 13-year-old girls’ performance. 
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TABLE XIII 


THE AVERAGES AND STANDARD DEVIATIONS OF THE Four TEsTs GIVEN TO 
Tuirty 7A Boys, AND TO ALL 13-YEAR-OLpD Boys or PuBLic ScHoot B 





COMBINED 
TBARS STEN. ARITH.- Scum 
ASSEMBLY | ASSEMBLY RE. Picened 
Averages 
N=30 Our Group of Boys 38.2 One 28.4 
All 13-Year-Old Boys..... 64.8 26.8 
o's 
N=30 Our Group of Boys 16.7 He of! 6.7 
All 13-Year-Old Boys..... Aye 18.6 9.6 








Our small test group has slightly higher averages in Stenquist 
Assembly, Intelligence and Combined Stenquist Picture tests, 
with more variability in Stenquist Assembly but less variability 
in Intelligence and Combined Stenquist Picture tests than the 
13-year-old boys in general. 

The correlations of the I.E.R. Assembly Test with the three 
above mentioned tests are shown in Table XIV. 


TABLE XIV 


CORRELATIONS OF THE I.E.R. Grir_s’ ASSEMBLY TEST WITH THREE OTHERS 
IN THE CASE OF THIRTY 7A Boys IN PUBLIC SCHOOL B 


CORRELATION 
WITH I.E.R. 
Arithmetic-Reading Intelligence............ 2-212 
Combined Stenquist Picture............... .50+ .09 
StenquistyAssembly;emmnteer item .53+.09 


This table shows that the J.E.R. Assembly Test does not 
depend very much upon intelligence in the case of boys, and that 
it does depend upon the abilities measured by other mechanical 
tests about as highly as these other tests depend upon each 
other. 
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MECHANICAL TESTS IN THE CASE OF AN UNGRADED CLASS OF 
GIRLS WITH CONCLUSIONS IN REFERENCE TO THE IMPROV- 
ABILITY THROUGH PRACTICE ON THE STENQUIST ASSEMBLY 
TEST AND THE I.E.R. Grrits’ ASSEMBLY TEST 


The two mechanical assembly tests were administered to 15 
pupils above the age of 11 in the girls’ school, public school G, 
who had done such poor academic work that they had been placed 
in the ungraded class. This class consists only of those whom the 
school authorities consider to be so mentally defective as to be 
incapable of worthwhile progress in the normal classes. In the 
ungraded class they are given individual attention by the teacher, 
with a large amount of emphasis upon handwork and training in 
household duties rather than upon the formal academic type of 
instruction. The average Stenquist Assembly Test score for 
these 15 ungraded cases, average age of 14.3, was 26 points; the 
average score of all pupils in the school, 180 cases, 13 years of age 
or over found in grades 3B to 6B inclusive was only 18 points. 
In other words, although the members of the ungraded class are 
considered mentally defective by the school authorities, they made 
8 points higher average score on the Stenquist Assembly Test 
than the general run of pupils in the school whose average age is 
in excess of 13 years. This difference might be thought of as 
possibly due to the difference in age of the two groups; but when 
we examine the norms of the 13- and 14-year old girls, we find 
that the norm of the 13-year-old girls is about 19 while that of the 
14-year-olds is 23. These norms include the brighter 13- and 
14-year-olds from public school J, which makes this an adequate 
sampling of those ages. Our ungraded group is therefore at least 
3 points superior in the Stenquist Assembly Test to the median of 
all the 14-year-old girls. 

The average I.E.R. Girls’ Assembly Test score of these un- 
graded pupils was 31. Suprisingly, this is 13 points lower than 
the norm for either the 13-year-old or the 14-year-old girls, both 
ages having the same norm. The I.E.R. Assembly Test is, 
however, known to depend more upon intelligence than does the 
Stenquist Assembly Test as determined by the correlations re- 
spectively with intelligence. We would naturally expect an 
unusual amount of mechanical practice of girls to be most ef- 
fective in producing differential average mechanical test scores of 
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the girls when that practice is directed into such mechanical 
channels as the majority of girls do not enter, namely, the boys’ 
mechanical tests. 

The correlation between the Stenquist Assembly Test and the 
I.E.R. Assembly Test in the case of these girls is .80. This 
correlation is twice the magnitude of the average intercorrelation 
of age groups for these tests, which indicates a relationship that is 
several times as great when measured in terms of the reduction of 
the standard error of estimate. Part of this relationship is due 
to the larger range of ability included; and without being able to 
estimate the effect of difference in range of ability, we are unable 
to predict to what extent practice in mechanical things has im- 
proved the correlation between the two tests. It seems rather 
impossible for difference in range of ability to account for all the 
differences in correlation, since this correlation coefficient of .80 
is considerably higher than the reliability coefficient of the Sten- 
quist Assembly Test, (.60) for seventh and eighth grade groups of 
public school boys. The evidence thus points to a marked in- 
crease in relationship between the two tests due to the practice 
in mechanical things to which this class had been subjected. 


THe J.E.R. Ass—EmMBLY Test AS A MEASURE OF A DISTINCT 
ABILITY OR GROUP OF ABILITIES 


The I.E.R. Assembly Test was administered together with 
other tests to 318 girls of ages 12 to 15. Its correlations with 
these other tests and with certain facts from the school records are 
given in Table V on page 22 of Chapter III. These correlations 
demonstrate that the test measures an ability less closely allied to 
ability with ideas and to success in school work than to the ability 
measured by the low-level clerical tests and the Stenquist Assem- 
bly Test. It seems to do for the girls what the Stenquist Assem- 
bly Test does for boys, but not so clearly and emphatically. 


THE DETERMINATION OF THE MECHANICAL INTEREST OF GIRLS 


If it be assumed, as seems reasonable, that one cannot possess 
an interest in anything, however elementary its nature, until he 
knows something about it, we have a basis for constructing inter- 
est tests. This principle, when applied to mechanical things, 
assumes that an individual who is interested in mechanical things 
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will have normally and without undue effort absorbed and re- 
tained, to the point of psychological recall, bits of amateurish 
information about mechanical things. 

With this point of view in mind, a list of 200 questions, designed 
to test amateur knowledge of many mechanical situations, found 
in women’s mechanical environment, has been collected from a 
number of sources. Each question can be answered with a single 
word. The recall form of question eliminates the guessing 
element of a true-false or multiple choice form of test. The recall 
form, measured by the number of questions contained, invari-~ 
ably has a high reliability. 

This test was not used after it was decided to delimit the scope 
of this inquiry by not investigating the relationship of interest and 
ability. It is given here for the benefit of anyone who may care 
to experiment with this type of test. 


GIRLS’ GENERAL TRADE TEST 


1. What tool do you use to tighten up a roller 
skate which is loose on your shoe? i i(i«i.w kw eee ee ee 
2. What does a shoemaker put on his thread 
beforersewine-asshoe withitl=. 9 Saute wale 


. What part of a wall clock regulates the time? ............ 

. With what material are the hammers of a 
DianOrCOVClLC samme Mn oe me Ey ae Pe 

. How many sheets of paper does a printer 
Callen Leal a meen ae re i | lancet aids ates 

. What substance is often used in rain water 
filters to absorb the foul gasesinthe water? ............ 

7. What do you call the part of a fishing rod on 
which the fishing linerolls up? tg tees ees 

8. What would you put on a cork to keep it 
from-apsorpineswateny 8) cy RE 


nN nN PW 


g. What wheelin a watch regulatesthe time? —.............. 
10. How many valves are there in a hand lift or 
SUCTIONEDUMLD ees gle wilt tlie pF Sod dh yu cowitorten eae the 
11. What liquid is used in an artificial ice-mak- 
ing plant to freeze the water by its evapo- 
heehee Se “7 ee el RS een Sager a en oer Sere 


12. What wood is best for making clothes chests? ............ 
13. Of what material are photographic films 
Ac nn a SINC Ba he ee Ce cls es 
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What do you call the narrow strips of wood 
which are nailed all over the inside of a 
house to hold the plaster on? 


. What tool do you use to sink the head of a 


finishing nail below the surface of the wood 
on varnished work? 


. What do you call the timbers to which the 


sheeting for a roof is fastened? 


. What do you call the mudlike substance 


used to hold window glass in place? 


. What is mixed with water to make white- 


wash? 


. What tool does a bricklayer use to spread 


his mortar? 


. When cement plaster is put on the outside of 


a house for a finish coat, what is it called? 


. What, besides sand, gravel and water, is 


used in making concrete? 


. What do you call the crook in the waste pipe 


under a wash sink to prevent sewer gas from 
getting into the house? 


. What do you call the part of a faucet which 


you replace in order to stop it from dripping? 


. In what part of a steam-heating system is 


the steam made? 


. What is often provided near the top of a 


steam radiator to let the air out? 


. What part of a furnace do you open to make 


the fire burn hotter? 


. If the base of a right triangle is 3 feet and the 


height is 4 feet, what is the length in feet of 
the third or longest side? 


. Ina quarter-sized drawing, how many inches 


on the drawing stand for one foot on the 
object? 


. What do you call a saw which you use to cut 


iron rods? 


. What do you call the pin sometimes used to 


keep a nut from coming off a bolt? 


. What do you call a caliper which will 


measure to the thousandth part of an inch? 


. What do you call the heavy iron tool on 


which a blacksmith holds his horseshoe to 
pound it? 

What tool does a blacksmith use to pick out 
a hot horseshoe from the fire? 


How is a broken iron rod mended? 
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. What do you call the twelve-pound hammers 


used by blacksmiths? 


. What kind of water is used to fill a storage 


battery? 


. What do you call the connection when dry 


batteries are connected each with its carbon 
wired to the zinc of the next nearest one? 


. What is the voltage of an ordinary house 


electric lighting system? 


. Of what material are the white tubes made 


which are used to insulate electric light wires 
where they go through a thin wooden wall or 
partition? 


. What acid is used in the solution of a storage 


battery? 


. What safety device is used in house wiring to 


protect the electrical circuit from too heavy 
a current? 


. What do you call the iron pipe through 


which electric wires are run in house wiring? 


. What do you divide the number of volts by 


in order to get the number of amperes 
which are flowing through an electrical 
circuit? 


. What instrument reduces the power line 


electrical voltage low enough for household 
use? 


. What instrument is used to test a storage 


battery solution? 


. What must be done to automobile cylinders 


after they become badly worn? 


. What part of an automobile engine con- 


nects and disconnects the flywheel and the 
engine shaft? 

What instrument is used on the dashboard 
to show the speed at which an automobile 
is traveling? 

What part of an automobile deadens the 
noise of the exhaust? 

In what part of an engine do the pistons 
work back and forth? 

What substance is sifted on an automobile 
inner tube to keep it from sticking to the 
casing? 

What would you do to prevent an auto- 
mobile from kicking back when cranking it? 
What kind of bearings are used in a bicycle 
to prevent friction? 
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What is the diameter in inches of most bi- 
cycle wheels? 

What do you call the toothed wheels over 
which a bicycle chain runs? 

How many cycles do most motorcycles 
have? 

What do you call the large wheel on the side 
of an engine to make it run steadily? 

What tool is used to sharpen a garden hoe 
while at work in the garden? 

What is used to hold an axe head tightly on 
the handle? 


Of what kind of stone are grindstones made? 
Of what wood are the best axe handles 
made? 

After mixing the baking powder and salt in 
the flour for biscuits what do you do to the 
flour mixture before putting in the eggs and 
shortening? 

What liquid, combined with salt, makes a 
good homemade polish to remove the tarnish 
from brass? 

What often forms on brass so that it will not 
stay polished? 

What finish is often given brass beds in the 
factory to keep them bright and polished? 
What tool is used to make the holes in eyelet 
embroidery? 

How many needles would one need to buy in 
order to knit a woolen stocking by hand? 
What do you call the blunt-point tool which 
is used for drawing baby ribbon into in- 
sertion? 

How many stitches are taken between each 
locking of threads in hemstitching? 

What letter on a standard typewriter key- 
board is immediately to the right of the 
letter s? 

What do you call the lever which you press 
on a typewriter to get capital letters? 
What do you call the large rubber roll which 
feeds up the paper on a typewriter? 

What part of a telephone switchboard do the 
plugs fit into? 

What do you call the thin pieces of sheetiron 
behind the mouthpiece in a telephone trans- 
mitter? 
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Of what material are telephone transmitter 
mouthpieces made? 

What stiff material would you use for mak- 
ing a hat foundation? 

When hemstitching is cut in two lengthwise, 
what do you call the points which remain on 
the sides of the material? 

What is the name of the material which you 
use to set sleeves in a garment? 

What do you cut in the seam of a garment to 
keep the material from unraveling? 

On what part of the phonograph does a 
phonograph record rest while being played? 
From what part of the material would you 
always cut binding? 


Of what material are watch springs made? 
What cooked vegetable may be used as a 
substitute for glue or paste? 

From what material are the strongest 
crochet hooks made? 

What do you call the laundry machine which 
is used for smoothing large flat work like 
sheets and towels? 

What chemical is sometimes put in corn to 
make it keep when canned at home? 


What do you put in sour milk to sweeten it? 
What do you call the machine on which 
homemade carpets are woven? 

What do you call the thread used lengthwise 
of strip carpet to hold the filling together? 
To what part of a horse’s bridle are the lines 
or reins fastened? 

What one numeral is not shown on any keys 
of the adding machine? 

What do you use to propel a canoe in the 
water? 

What is used on the stern of a sailboat to 
steer the boat? 


How many strings has a violin? 

What acid is used in most eyewashes for 
babies? 

What is the best substitute for a part of the 
eggs in a recipe calling for many eggs? 

What do you call the long stitches used to 
hold two pieces together while sewing them 
on the machine? 
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What do you call the spool on which the 
lower thread of a sewing machine is wound? 
What part of the head of a sewing machine 
keeps the material from slipping? 

When using thread of any size between No. 
40 and No. 70, what size sewing machine 
needle would you use? 

What do you call the part of a typewriter 
which makes the ribbon go up and down? 
What do you call the part of an electric light 
fixture into which the bulb screws? 

What tool does a paper hanger use to press 
down the seams? 

What do you call the beading which a paper 
hanger sometimes puts around the room at 
the height of about 34 feet? 

What tool does a shoemaker use to make 
the holes for the shoe tacks? 

What tool does a butcher use to cut a bone 
or gristle which is a little too small to be 
sawed? 

What tool does a butcher use to put a keen 
edge on his knife without regrinding it? 
What gas is used for fumigating after a con- 
tagious sickness or to kill roaches and 
crickets? 

What chemical in bleaching powder gives it 
its bleaching power? 

What liquid would you use to take shellac or 
varnish off a window? 

What liquid would you use to cut the rust 
off a piece of steel? 

What do you drop into homemade wood 
ashes lye to test its strength? 

What do you drop into coffee to make it 
clear? 


What do you add to grease to make soap? 
When lard is put in cakes or bread, what do 
you call it? 

What tool would you need to uncouple a 
hose from a faucet or hydrant if it had stuck 
tightly? 

What do you call the joint in the corner of a 
picture frame? 

What calibre are most target and small 
game hunting rifles? 
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What tool does a glazier use to drive in 
glaziers points? 


Of what metal are glaziers points made? 
What do you dissolve in flour paste to make 
sizing for paper hanging? 


What tool is needed to hang a screen door? 


What is the main ingredient of all omelettes? 
In what do you cook oatmeal to keep it from 
scorching? 

What besides egg, flavoring and seasoning 
is used in making custard? 

What is the main ingredient of turkey 
dressing? 


What flavoring is used for eggnogg? 
What are stuck into the fat of the ham in 
making baked ham? 


In what kind of pickles is the brine allowed . 


to ferment? 


What kind of cake takes many eggs? 

In making cornbread with sweet milk, what 
would you put in to make it rise? 

What ingredient, usually used in cakes, is 
left out of sponge cake? 

What kind of stone is used for heat in a fire- 
less cooker? 

Through what kitchen utensil do you rub 
the apple pulp in making apple butter? 
What do you use to clarify grease that has 
been used? 


What is the watery part of sour milk called? 

Of what material are the mats made which 
are used under pots or pans to prevent 
scorching? 

What part of a single harness is fastened to 
the singletree or whiffletree? 

What do you call the part of an ice skate 
which touches the ice while skating? 

What small attachment on a camera shows 
the picture which is being taken? 

What do you call the way in which ribbon is 
cut so that it will not ravel? 
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What ingredient is browned in making 
brown gravy? 

What do you call the sauce which is made of 
flour, water, butter and seasoning only? 

At 10 cents a kilowatt hour, how much will 
it cost to use a 500-watt electric iron for 4 
hours? 

What do you do to the first coat of varnish 
before giving it the second coat if you want a 
fine polish? 

What do you use on white pine or spruce to 
to give it a mahogany finish? 

What are used under the legs of a dresser so 
that it can be moved about easily? 


What oil is used in mixing gilt paint? 

What do you put on baked sweet potatoes to 
make them brown? 

What besides salt, water and color is needed 
to make salt beads? 

What do you put in the hot water in which 
you are washing rubber rings? 

How many X’s stand for confectioner’s 
sugar? 

What do you call the mixture of confection- 
er’s sugar and water used in candy? 
What-is the main ingredient of marsh- 
mallow? 

What do you call a narrow piece of material 
of contrasting color used for decorative ef- 
fect as in the vertical seams of a skirt? 
What do you call the pump which is used to 
clear out clogged plumbing fixtures? 

What kind of stitch is used on a salt sack so 
that it will easily ravel? 

What color, other than green, could you dip 
a blue dress into in order to dye it green? 
What liquid do you use to set the color when 
dyeing a dress red? 

What is put on the back of a piece of glass to 
make a mirror out of it? 

In making a hat frame, what do you call the 
wire which you use to fasten together two 
wires where they cross? 

In a hat frame, what do you call the wires 
which run from the crown to the edge of the 
brim? 
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What does a dressmaker use to take her 
measurements? 

What do you call the very long stitches used 
by a dressmaker to hold the parts together 
while fitting the dress? 


. What do you call the stitch used to prevent 


raveling on the edges of a seam? 


. Of what color should bedsteads in a sick 


room be? 


. What is the process of disinfecting a room by 


gas called? 


. What is added to milk to keep it from cur- 


dling when making creamed tomato soup? 


. What kind of bones are used to flavor bean 


soup? 


. What vegetables are used in succotash? 


. What kind of meat is used in Irish stew? 
. From what part of the beef is porterhouse 


steak cut? 


. What meat is used to season baked beans? 
. What herb is sometimes used to season sau- 


sage? 


. What do you do to cotton batting before 


putting it in a new sofa pillow to keep it from 
matting? 


. What do you do to a fruit jar top to make it 


unscrew easily? 


. What are the two principal ingredients used 


in French dressing? 


. What do you call the process of setting the 


colors after hand painting on china? 


. What liquid is applied to charcoal drawings 


to fix them? 


. What is used to drive the tools in embossing 


leather? 


. Of what material is the point of a pyro- 


graphic needle made? 


. Of what material are the victrola needles 


made that give the softest tones? 


. Of what material are the best bathtubs 


made? 


. What should you do to an oil lamp wick be- 


fore extinguishing the light? 


. What happens to iron saucepans when they 


are put away damp? 


apiel ee! iafcaerel se) “s) (6,4! tof! 


56 Tests for Vocational Guidance of Children 


186. When clothes are left in a warm, wet con- 

dition for several hours, what do you call the 

injurious result? 4 ee ee 
187. What common household liquid will remove 

paint easily from window-glassf— "0 6 en ee 
188. What common article of food is used to 

‘“‘set’’ pink, green or black colors when dye- 

NEP ee ee ae gt ae ee i i 
189. When sleeves are larger than the armhole 

where should the greatest amount of fullness 

comer 5. 0 Oe Oi ae! 0 eee 
190. What do you call celluloid glasses worn to 

protect the eyes against sun-glareanddust? ............ 
191. What do you put under a hot dish of food to 

keep the heat from injuring the table? —siw“w www sss. 
192. After giving a baby its bath, what should 

wou put on 1 to prevent chatingr -) = =. .76- ee eee 
193. What is the name of the small vegetable 

which looks like an onion and is used to 

season food? 0") eG oO) Tete? Oy aa ae ee 
194. What vessel besides a coffee pot is most 

commonly used to make coffee? ~ - 2 See 
195. What do you call the piece of material which 

you sew on a worn place ina garment? ............ 
196. What do you call the slimy mass which forms 

in vinegar when it stands foralong time?  ............ 
197. In making bread what do you put in it to 

make it-rise?y 5° SO ee a 
198. What fruit is frequently cooked in with 

bread: or rice-pudding?.” =) 9° 6 15 9 pan ae eee 
199. Of what material is the best heavy thread 

made which is used for sewing on overcoat 

buttons?.: 2-25 <9 sey Beas | Pi Dee ee 
200. What tool is quickest to use in making 

whipped:cream? St.) 0 SS) Se eee 
201. What cloth is made from the hair of angora 

goats? Are = [ 9) eR ie, Se ee eee cee oe 
202. What do you sometimes put on thread to 

make it stronger for sewing on buttons? —_.............. 
203. What do you do to cream in order to make it 

Whip? . (8° Ses Sy Ne en 
204. What do you call the boiler which will keep 

ricé from ‘seorching?=" “iL 8) eee ee ee 


A TRUE-FALSE TEST OF GIRLS’ MECHANICAL INTEREST 


One of the values of the true-false test is that bits of information 
which cannot be used to advantage in other forms of tests can 
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readily be adapted to this form of test. One can also readily 
raise the question as to which of two practices is the better, when 
there are reasons why one should be preferred to the other. 

The following list, which is but a slight beginning, shows how 
readily this form of interest test can be constructed. Practically 
all of the questions in the Girls’ Recall General Trade Test can 
readily be adapted to this form of test. Like the former, this test 
was not used after it was decided not to include a study of me- 
chanical interest in this investigation. 


TRUE-FALSE GIRLS’ GENERAL TRADE TEST 


1. No. 50 sewing thread is a very coarse sewing thread. 

. 2. Only three needles are used in knitting the heel of a 
woolen stocking. 

3. Rose color is a kind of purple. 

4. Lavender is a shade of yellow. 

5. A bodkin is a very small darning needle. 

6. Basting thread is stronger than sewing thread. 

7. The lengthwise threads of cloth are stronger than the 
crosswise threads. 

8. The eye of a sewing machine needle is in the top. 

g. The flat part of a sewing machine needle is at the end 
which contains the eye. 

10. A white dress may be dyed pale pink. 

11. A dark blue dress may be dyed pale green. 

12. A typist is a person who sets type in a printing office. 

13. The soles of shoes are sometimes made of paper. 

14. An automobile may be brought to a dead stop within 
five feet when the machine is traveling at fifty miles 
an hour. 

15. Gingham is used principally for making tablecloths. 

16. Damask is much used in evening gowns. 

17. Wisteria blooms at the same time as clematis. 

18. The Hoover is a brand of breakfast food. 

19. Postum is a tooth paste. 

20. Crisco is a scouring cleaner. 

21. A knitting needle ends in a hook. 

22. New potatoes are scraped instead of peeled. 

23. New potatoes make good mashed potatoes. 

24. Fruit cake is best if made only one day before eating 
it. 

25. Stale bread is better than fresh bread in turkey dress- 
ing. 

26. The U. S. Government allows the use of benzoate of 
soda to preserve catsup but not milk. 
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. Beet sugar is cheaper than cane sugar. 

. Beet sugar is sweeter than cane sugar. 

. “Soft white” sugar is always granulated. 

. Ammonia in dishwater will cut the grease. 

. Mustard is a good emetic. 

. Goldenrod blooms in the spring. 

. Zinnias are one of the earliest blooming spring plants. 
. Bloodroot is a woods flower, blooming in the fall. 

. Radishes are ready for eating five weeks after plant- 


ing. 


. Round steak is the most expensive cut of steak. 

. Bacon is cut from the sides of the hog. 

. Neck cuts are used mostly for frying. 

. Tripe is made from sweetbreads. 

. If ashes are allowed to accumulate in a grate they 


will cut off the draught. 


. A small pipe in a kitchen sink is less likely to choke 


with grease than a large one. 


. Grease is easily removed from drain pipes by the use 


of potash and hot water. 


. Fresh grapes may be kept for several weeks by pack- 


ing them in sawdust. 


. Enameled iron is sometimes used for making bath- 


tubs. 


. A needle shower is always a cold shower bath. 

. Waste pipes should never be made from cast iron. 

. Brass pipes are not corroded by ordinary water. 

. Water pipes usually burst when the water in them 


freezes. 


. A figured carpet wears longer than a plain carpet of 


the same quality. 


. If a person’s ears are frozen rubbing them with snow 


will take out the frost. 


. A solution of water and baking soda is good for scalds 


and wasp stings. 


. Confectioner’s sugar always is marked with five X’s 


in a row. 


. Goods bought in bulk are usually cheaper than those 


bought in the package. 


. The best artificial light may be secured from candles. 
. Candles may be purchased by the pound or in packages. 
. The charred parts of a burned lamp wick should be 


pinched off. 


. If an oil lamp is filled entirely full there is danger of 


explosion. 


. The wick to an oil stove should be turned up as far as 


possible before lighting it. 
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. A shopping list with approximate prices and quanti- 


ties is a great hindrance to efficient shopping. 


. It is more expensive to buy perishable goods in large 


quantities. 


. Food which is ‘‘in season” is always more econom- 


ical than food which is ‘‘out of season.”’ 


. Thoughtful consideration of tradespeople is not a 


good policy. 


. It is inadvisable to have stated periods for settling 


household accounts. 


. The quantity of food needed for a given number of 


people may usually be found from recipes in a cook 
book. 


. Milk is sometimes the means of carrying and spread- 


ing disease. 


. Turpentine will remove paint spots from a glass win- 


dow. 


. If water splashes against the baseboard in scrubbing 


it may turn the varnish white. 


. Lamb is less nutritious than mutton. 
. Fresh oysters are good to eat the year around. 
. Fresh pork should be purchased in the warm months 


of the year. 


. Liver, kidney and tripe should be used immediately 


after purchasing. 


. Young fowls may usually be purchased for less money 


per pound than old fowls. 


. Fresh eggs will float if put into water. 
. The most desirable potatoes are those having many 


deep eyes. 


. Very small potatoes lose a great deal of weight in the 


peeling. 


. If silver spoons are left in fried eggs for several hours 


they will tarnish. 


. Fruit should always be stored in a dark, cool place. 
. A good paste may be made from flour and water. 
. When packed in salt, eggs will keep fresh for a long 


period of time. 


. When storing soap, care should be taken to stack the 


bars so air may pass between them. 


. Hard and dry candles are less liable to burn away 


quickly than soft candles. 


. Plates should always be removed from the right-hand 


side of the diner. 


. Sand is a good scour for saucepans. 
. Vinegar and hot water will remove the smell of onions 


from saucepans. 
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Sugar and water boiled together will make an excel- 
lent syrup for griddle cakes. 

Nickel saucepans are excellent for cooking sweets. 
The use of soda to clean aluminum will turn the metal 
black. 

Pine tar bags are good moth preventives. 

Ammonia is good for cleaning linoleum. 

It is not a good plan to remove pictures from the walls 
before cleaning the walls. 

Backs of brushes are sometimes made from tortoise 
shell. 

It is a good practice to dry a hair brush near an open 
fire. 

Combs are sometimes made from gutta-percha. 

It is not a good plan to keep household brushes hang- 
ing up when not in use. 

A whisk broom is a long handled household broom. 
Dust should never be emptied from a carpet sweeper 
immediately after using it. 

A vacuum cleaner is sometimes used in place of a 
broom. 

A whisk broom and a carpet broom are intended to 
clean the same articles. 

It is economical to purchase cheap brushes because 
they will not need to be replaced frequently. 
Sponges are manufactured from the pulp of certain 
trees. 

Coarse sponges are less expensive than soft sponges. 
Sponges should never be rinsed after using them with 
soap. 


Chamois is the softest household leather obtainable. 


Chamois should always be washed in cold water. 

A warm, dry room is the most preferable place for 
keeping household linens. 

Linen damask is a figured fabric sometimes used for 
curtains. 

Sheets are sometimes made from unbleached cotton 
material. 

Linen huckaback is a good material for bedroom 
towels. 

Hand towels made from cotton are more serviceable 
than those made from linen. 

Face towels and bath towels are usually about the 
same size. 

Mending the laundry before putting it away is not 
so satisfactory as mending it only as it is used. 
Selvage threads will break easily when pulled. 
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Weft threads pass across the warp from side to side of 
the material. 

A mangle is a machine used to iron flat work. 
les ironed clothes are sometimes put on a clothes 
1orse 


Borax and water should be used to remove tea stains. 


Clothes should never be hung with the thick part 
uppermost. 

Silk should always be wrung thoroughly with the 
hands when washing it. 

When a person has burned himself badly, olive oil and 
borax solution is a good remedy. 

Iodine applied too frequently to the body will blister 
the skin. 

Receipts for money should always be destroyed at 
once. 

A pass book may be purchased from a bank at a small 
cost. 

In sewing, a pattern should never be used when the 
garment is cut on the fold of the material. 

A garment may sometimes be cut larger than the pat- 


tern by making a fold in the material before cutting. 


Garments cannot be cut very accurately when the 
pattern is pinned to the material. 

Neck and wrist bands and belts should always run on 
the selvage of the material. 

Basting is a permanent, durable stitch. 

When the neck of a garment is too large, it may some- 
times be made to fit by making small tucks. 

When making a French seam, you always make the 
first seam on the wrong side of the material. 

Thread size 100 is the coarsest thread made. 

Thread size 16 is a good size for working buttonholes 
in baby clothes. 

When sewing lace onto an edge always hold the lace 
next to you. 

Veal has higher food value than beef. 

Peanuts grow on the roots of peanut plants. 

Parsley is a vegetable which grows on a vine. 
Orangeade is a sort of jelly made from the rinds of 
oranges. 

Olive oil is sometimes used as a sort of medicine. 
The largest pods of okra are the most tender. 

Oats is a well-known cereal. 

A napkin is a sort of towel used at the table. 

Some soaps contain naphtha. 

Fresh fruits are digested more quickly than meats. 
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To make table linen very smooth, iron it when per- 
fectly dry. 


Pieces of much worn linen make excellent bandages. 


Lamp chimneys and globes should always be washed 
in clear, cold water. 

Woolen cloths are better than cotton cloths for polish- 
ing. 

Varn'shed furniture should always be cleaned with a 
damp cloth. 

Before burning sulphur in a room, all metal articles 
should be removed from the room. 

A piano keeps its tone better in a steam-heated apart- 
ment than in one with hot-air heat. 

To keep cut flowers fresh keep them in a very warm 
room. 

Partly ripe tomatoes will ripen quickly if put in a 
sunny window. 

Partly ripe bananas will ripen best if wrapped in 
paper and put in the dark for a day or so. 

Beading is a strip of cloth trimmed with beads. 


Alfalfa is a woolen material used for suits and dresses. 


A barrette is a removable bar in a gate. 

A brassiére is a kind of cooking kettle. 

Sulphur is added to rubber to preserve the elasticity 
of the rubber. 

Milk bottles are best cleaned by washing in clear, hot 
water only. 

A cleaver is used to fasten together two pieces of 
board. ° 

Cluny is a kind of lace. 

A coffer is a pot for making coffee. 

Canned currants will keep if mashed and added to the 
same weight of sugar without cooking. 


A fowl should be cooked immediately after killing it. 


A Welsh rarebit is a steak only slightly cooked. 
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CHAPTER, V. 


TESTS OF ABILITY WITH CLERICAL ITEMS AND 
PROCEDURES! 


Our work on clerical tests began with the evaluation of a series 
which had been given in 1920 by Dr. L. W. Sackett to soldiers in 
the Camp Grant schools. 

From these data obtained for bookkeepers in the army school we 
were able to use the partial correlation method in the weighting of 
nine variables to form a provisional bookkeeping placement ex- 
amination. This was never used, since work was immediately 
begun on a thorough revision of a set of tests later called the Unit 
Tests, which could be given in a shorter time limit, and could be 
quickly and more objectively scored. 

The criterion of ability to progress consisted of four variables: 

I. Teacher’s estimates of “potential ability.”’ The teacher 


rated the pupils in letters, arbitrarily later given numerical scores 
as follows: 


A=10;B=8-C4-=6)C=5; C—=4; D=2;E=0. 


II. A combination was made of the arithmetical sum (thus 
weighting each directly according to its a) of teachers’ estimates of 
“morale or interest, intelligence, mathematics, and language,”’ the 
ratings being each given on a scale of 1 to 5. The arithmetical 
sums were arbitrarily given credit on a scale of 0 to 10 as follows: 
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III. School marks on a scale of 0 to 10. 
IV. Measures of absolute school progress. 


These were lengths of line in a grade book, the lengths of line 
indicating the relative progress in different “fundamental ele- 
ments or operations” of bookkeeping theory. With equal 
practice, those who made the largest amount of absolute progress 
were farthest along toward final or graduating proficiency. These 
lengths of line were mechanically summated by means of a pair of 
dividers, and scores were given, on a basis of 0 to 10, for total 
lengths of line in units as follows: 


22-34 =0; 35-64 =2; 65-168 =5; 169-187 =8; 188-214 =I0. 


1 The reader who is interested only in the results finally attained may omit all of 
this chapter save the pages which report the methods of selection of tests for the 
I.E.R. General Clerical Test (pp. 73 to 84), and the results with New York City 
school children (pp. 96 to 99). 
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It will be noted that all of the above variables have been 
changed into ranks on a basis of 0 to 10, assuming a normal distri- 
bution. The o’s of the four variables are thus practically equal. 
The scores were then added. Thus, with o’s equal, extreme 
weight was given to school marks, measures of absolute progress, 
and estimates of potential ability as against the weight given to 
interest, intelligence, mathematics and language. The summated 
scores were all divided by 3 to yield smaller numbers for ease in 
calculating correlations. These scores become variable 1, the 
criterion, in the following discussion: 


The variables evaluated, by variable numbers, are: 


1. Criterion of bookkeeping. 

2,3, 4. Tests not evaluated. 

5. A Trabue Completion test of the usual form made from 
selected Trabue sentences. This was test 5 of the Sackett Series. 
It was later revised, use being made of a new principle in the 
reactions of subjects whereby the initial letter of the word to be 
completed is given, and becomes test 4 of the Unit Tests. 

6. Substitution of code letters for numbers as in store price 
codes. This, when revised, becomes test 9 of the Unit Tests. 
This was test 6 of the Sackett Series. 

7. Filing. Words are marked with the number of the letter 
group under which they would be filed in the letter scale. This, 
when revised, becomes test 10 of the Unit Tests. This was test 7 
of the Sackett Series. 

8. Copy Checking. A test to detect and correct errors in copy- 
ing totals of arithmetical additions to a vertical column, and in 
checking the correct transfers. This becomes test 7 of the Unit 
Tests. This was test 8 of the Sackett Series. 

9, 10. Tests not evaluated. 

11. Number Copying. A test of copying numbers of increasing 
number of digits into blank spaces on the back side of the test 
sheet. This is test 11 in both the Sackett Series and the Unit 
Tests. 

12. Test not evaluated. 

13. Army Alpha. Form 6. 

14. Army Arithmetic Test. A 20-minute test in the four 
fundamentals, modeled after the Woody Arithmetic Tests. 

15. Army Reading Test. Time, 14 minutes. 

16. Mechanical Interest Test. A test aiming to measure 
familiarity with the use of common mechanical tools. This test 
was given as a routine test for placement of soldiers in mechanical 
courses. No time limit; about a 45-minute examination. 


There were 27 cases in the bookkeeping group. All scores in all 
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the variables were used as available, 7.e., as ratings comparable to 
the Army Alpha letter ratings arbitrarily given scores as follows: 


A=10; B=8; C+-=6; C=5; C— =4; D=2; E=0. 
The intercorrelations, together with o’s and averages, are shown 


in Table XV. 


KEEPING STUDENTS IN THE SEVERAL VARIABLES OF THE TEXT * 


(The numbers in parentheses are the numbers of the variables 


TABLE XV 


THE INTERCORRELATIONS, STANDARD DEVIATIONS AND AVERAGES OF 27 BOOK- 


of the Unit Test Series.) 
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* The Probable Error of the Correlation Coefficients in Table XV may be found from the 


following table: 
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The regression equation resulting is as follows: 


Unit No.: 4 9 10 7 II 
TRABUE SUBSTI- Copy NUMBER ARMY 
ae TUTION tie eee COPYING ALPHA 
xX, X6 Xi X 13 
a x79 aa Nee 352! See 3312 “OO hae 
O71 O7 03 O11 013 
ARMY ARMY MECHANICAL 
ae READING INTEREST 
X15 X 16 ric = -72 +.06. 
ae Ore 064—— — .025—. A 


O14 714 016 


It is interesting to note that not only do the placement tests for 
academic subjects, Alpha (13), Arithmetic (14), and Reading (15) 
correlate lower with the criterion than do the clerical tests, 
variables 5, 6, 7, 8 and 11, but also that the importance of each of 
these academic tests is less than the least valuable of the clerical 
tests in the regression equation. 

The magnitude of the multiple correlation coefficient, 77~ = .72 
+ .06, is such as to indicate much promise from a composite series 
of such variables in placing people in bookkeeping courses. 


Since it is apparent that the Army Alpha, Arithmetic and 
Reading tests add but little value to the efficiency of the place- 
ment relative to the amount of time taken, it seems desirable to 
* find the multiple correlation coefficient for the clerical tests alone, 
since these can be given in a fraction of the time required for all 
the test variables. The five clerical tests alone, weighted by the 
same regression weights, yield 7;7-=.70+07. This would be 
slightly improved upon were one to calculate the new regression 
weights, involving the five variables only. 

Thus we might take a short examination composed of Unit Tests 
Nos. 4, 9, 10, 7 and 11 and with this group secure a correlation of 
.70 between the criterion of bookkeeping ability and the composite 
score weighted according to the weights of the above regression 
equation, provided the revision, not only in the alteration of the 
form of the tests but of the time limit, does not operate to destroy 
the above existing interrelationships of the tests as given in the 
table of intercorrelations. 

The results of this experiment were so encouraging that a 
thoroughgoing revision of the tests was undertaken, the subjects 


Abthity With Clerical Items and Procedures 67 


being the pupils of a large business college. These students are of 
a more selected range of ability, have carefully kept records, and 
are quite typical of the type of persons who enter industry through 
the route of a business college training in typing, stenography, or 
bookkeeping. It is quite evident that with a group of more 
limited range of ability, lower correlations will be found for equally 
meritorious tests; but, if these are computed on a more reliable 
criterion, they are preferable to the larger correlations obtained 
on the soldier group. Again, the five tests here combined may not 
be at all meritorious in predicting ability in stenography and 
typing. Thecorrelations of school marks in stenography with the 
regression prediction of bookkeeping fitness total scores on both 
the entire set of nine tests and the shorter set of five tests for 
thirteen students in the army school of stenography were com- 
puted with the results of Table XVI. 


TABLE XVI 


CORRELATIONS OF WEIGHTED TEST SCORES WITH MARKS IN BOOKKEEPING 
AND IN STENOGRAPHY, CAMP GRANT SOLDIERS 














No. Tic ALL |r7¢, WITH ONLY 
GRouP Cc : F 
OF CASES | Nine Tests | Five Tests 
Criterion of Bookkeepers........ 27 .72+ .06 A O= OT; 
Stenographers’ School Marks... .. 13 04.19 O01 = .19 





The five tests as above weighted yield a correlation of .o1 
between school marks and the test composite in the case of the 13 
army students in stenography; and the nine tests a correlation of 
.04, thus proving these weightings to have no value for predicting 
these stenographic marks. Different tests or different weights of 
the present tests are needed; or, probably the reliability and the 
validity of the stenographic school marks are at fault. 

For practical work, the procedure should be simplified to save 
the necessity of making so many transformations of data before 
obtaining the final fitness scores. 


THe Unir Tests 


It became evident that for army use, and for vocational guid- 
ance use in general the tests must be of such a nature as to be 
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given by teachers or other persons not specifically trained in 
psychological test techniques. Consequently, it was desirable 
to reduce to the minimum the amount of necessary preliminary 
training on the part of the examiner. This necessitated first of all 
that the directions should be read by the subjects rather than by 
the examiner, if uniform results were to be secured from the work 
of different examiners. It was decided that the directions should 
be printed as a part of the test and included in the test time, to be 
read silently by the test subject, and work on the test to be begun 
just as soon as he had completed the directions. This means that 
each test has in it the element of ability to understand printed 
directions. Whether or not this is an advantage or disadvantage 
remains to be proved. At least one may say that all of the con- 
ventional tests involve the requirement of ability to understand 
verbal directions. The desirability of having the directions a part 
of the test, rather than the usual verbal directions read by the 
examiner, is a question to be settled, not on the basis of a priorz 
reasoning, but rather on the basis of the correlations to be obtained 
with an adequate criterion. The necessary experiment has not 
been performed. Lack of ability to understand the directions of 
a test will result in too many zero scores. In industry we have 
noted not a single case of a person working in clerical work who 
had not at least a sixth grade education; consequently, the reading 
ability of clerical workers in general is undoubtedly superior to 
sixth grade reading ability. This makes the argument for the 
necessity of verbal directions, given by the examiner, rather 
ineffective. 

It would also be very desirable if the test could be given with an 
over-all time limit, that is, by the work-limit method. In a 
vocational guidance bureau, for instance, the applicants are likely 
to come straggling in one at a time and it would be quite laborious 
for the examiner to keep records of the time on each test. An 
empirical formula for changing time-limit tests over to a work- 
limit method, by consideration of the partial correlation weights 
and the variabilities of the different tests, has been considered. 
If such a technique should prove successful, one would be able so 
to vary the number of test elements in each of the tests that the 
tests would be properly weighted with respect to each other in the 
total scale by means of the relative number of elements in the 
several tests. Neither sufficient time nor adequate data have 


Ability With Clerical Items and Procedures 69 


been available for carrying this investigation to its conclusion. 
Consequently, time limits for the several tests have been estab- 
lished by choosing such a time limit that over 99 per cent of 
exceptionally good business college students are unable to finish 
in the time limit assigned. 

A ranking, by a number of test workers and psychologists, of 
tests in their order of predictive value in predicting stenographic 
ability had demonstrated the fact that there is little agreement 
among such test workers in regard to the relative validities to be 
expected from the different tests. Nearly all expressed them- 
selves as unwilling to make a prediction of the correlation that 
might be obtained between the various tests and an adequate 
criterion of stenographic ability. It seemed desirable, therefore, 
to construct a large number of tests varying from very routine and 
non-verbal on the one hand to very abstract and academic on the 
other, and then to administer these to a group of business college 
students upon whom adequate criteria of ability to progress in 
acquiring these subjects could be obtained. From this conviction 
it was but a step to the development of the Unit Test idea, already 
foreshadowed in part by the work of Link. 

As originally constructed, the Unit Test series consisted of 32 
tests, each of which had the directions included as a part of the 
test. The series ranged in test content from very routine to very 
abstract material. Each test was given a number rather than a 
name. 

The original Unit Test number is always to be printed or 
mimeographed at the left of the scoring box on any Unit Test, and 
will thus enable adequate comparisons of records to be made of 
test scores when the tests are given in different scale combinations. 
Thus, this plan purported to be the beginning of a test plan which 
might extend over a period of years. Once the tests which were 
to make up the Unit Test series were determined upon, adequate 
time limits determined, clear directions decided upon, the tests 
might be made up in mimeographed form and kept available in 
large numbers on the shelves of the laboratory. Since a general 
set of directions (and qualification questionnaire sheet) would also 
apply, it would be possible upon an hour’s notice to assemble into 
a test booklet any selection of 15 to 30 tests, or more, that might 
in the judgment of the experimenter be expected to yield positive 
correlations with the particular test criterion for which:one might 
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be called upon to construct a potential ability scale. Dr. Sackett 
and the writer had the assistance for final review in this work of 
Drs. Otis, Holley, Rice, O’Rourke and Teachout. The series 
was subsequently enlarged by the addition of mechanical tests, 
reading tests and others, to contain 42 tests. 

Numerous additional requirements were decided upon, most of 
these being considerations of the mechanical make-up of the 
printed or mimeographed page for ease and speed in scoring and 
to secure compliance with the directions on the part of the test 
subjects. For instance, in those tests where the subjects are 
inclined to work across the page rather than in columns according 
to directions, it has been found that a heavy vertical line dividing 
the page into columns will tend to cause the subject to work in 
columns rather than in rows. Innovations or improvements upon 
past tést technique used in these tests are given below: 





are provided in the lower 





1. Scoring boxes | A ee 


right hand corner of each mimeographed page, allowing a 
ready tabulation of attempts, wrongs and rights on each test. 

2. To the left of this box is placed the original Unit Test number 
of the test. This is always the same for a given test in what- 
ever combination or order it may be used, allowing a quick 
comparison of results on the one administration with others at 
previous or subsequent times. (The Unit Tests may be 
numbered serially at the top of the pages, according to the 
order in which they appear in a given scale.) 

3. Where the subject is to work in columns, rather than in rows 
across the page, heavy vertical lines automatically guide the 
subject to work in the columns as versus the rows. 

4. The administration time of each test is printed in the same 
position near the top of the page in each test, so that the test 
subject may secure a relative idea of the speed required on the 
LeSE- 

5. In all tests which require such, samples are always given 
immediately beneath the directions and administration time. 
It is the aim to have three samples on each test, the first two 
being easy samples and the last more difficult, in case there 
is a varying difficulty in the items. This most difficult sample 
should be approximately as difficult as the most difficult test 
reaction which the subject will have to make. The answers 
to the sample questions which indicate the reaction to be per- 
formed by the test subject are always written in script in order 
to indicate to the subject the proper place for his answer. 


10. 


I. 


12. 


12: 
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The samples always appear relatively at the same place on the 
page and are set off from the directions above and the test 
beneath by heavy horizontal lines; the test subject thus is 
never in doubt as to where to look for the sample reactions. 


. Practice pages, with a short time limit, are given for such tests 


as require involved explanation of the directions. 

At the foot of each page appears in italics the direction: 
Wat for the signal before turning the page. Where there are 
two pages to a test this signal has been changed to Turn over 
to the next page. More on the next page. It has been the 
attempt throughout to anticipate the points at which the test 
subject might wish to have a question answered and to have 
the answer to his question printed on the test blank at just 
that point. 


. If the test elements are not numbered on a test, the maximum 


credit in terms of cumulative number of possible rights in each 
column are printed at the foot of the columns. 


. In all cases where possible, the questions are numbered on 


both the left and the right margins of the page for ready 
reference in scoring. At the right-hand side of the page, the 
question numbers are staggered sufficiently to allow the odd- 
numbered questions to appear in a column on the left and the 
even-numbered questions in a column on the right. This 
allows one to compute the reliability of a test by the odds- 
evens method, since the ‘‘wrongs”’ or the ‘‘rights’’ in either 
column may be readily added up at a glance. Where the 
subject is to place his answer on a short horizontal line in 
columns, the answer spaces are also staggered, thus allowing 
more writing space for the answer. 

At the completion of the test material and immediately above 
the ‘‘Wait for the signal’’ sign, a. heavy horizontal line is 
always placed, indicating that the test is finished at that point. 
Where the subject works in horizontal rows, and frequently in 
columns also, the test material is grouped by horizontal spaces 
into groups of five test elements. This enables the test sub- 
ject better to keep his place, and also is an aid to the scorer in 
scoring. 

If there are several test reactions in a given horizontal row, as 
in encircling the pairs which make 10, the numbers placed in 
parentheses at the left of the rows indicate the cumulative 
sums of the possible credits in the preceding rows. This 
makes possible very quick computation of the number of 
attempts. 

In such a test as the above, where several test reactions occur 
in a row, an arbitrary keying device for determining the 
credits in that row is employed in the right-hand column of 
figures or letters; thus in a test involving figures at the 
extreme right of each successive row, the figures may readily 
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be made to show the number of possible correct responses in 
each of the rows respectively. 

14. In general, the tests aim always to require an actual mark to 
be made in order to receive any credit; i.e., the tests aim to 
avoid the type of test wherein one checks only the errors, 
leaving blank those that are right or vice versa. The scorer 
is quite confused by such procedure. One can readily have 
the test subject write ‘‘C’’ for those items that are correct and 
“W”’ or ‘‘X”’ for those that are wrong. 

15. It is very desirable that few dotted guide lines be used, since 
some subjects are prone to write their answers on the dotted 
line. Consequently, it has been the aim to eliminate these 
wherever possible by always using solid lines for the answer 
space and by so arranging the position of the question, or 
dividing it up in questions extending onto two lines, that the 
end of the question will be very close to the answer space 
without any intervening guide lines. In multiple choice tests 
it is very desirable to line up the left-hand end of each choice 
in columns, dividing the direction part of the element into two 
lines if necessary and placing the choices opposite the second 
line. 

16. In questions which extend over more than one line, the answer 
space is always placed on a level with the last line of the 
question. 


Since adequate criteria of ability to progress could be secured, 
it was decided to attempt securing a series of tests which by differ- 
ential weights would predict differential capacity to progress in 
each of the three business college courses, stenography, typing, 
and bookkeeping. It was also possible to construct an average 
criterion, or average of the sigma positions in each of the other 
three courses, and so to predict “‘ general business ability’ in much 
the same way as the average of abilities in a number of grade 
school subjects is commonly called ‘‘ general scholarship.” 

The excellent criteria which have been secured are the results 
of many days of painstaking effort in compiling from teachers’ 
grade books, the school register, the attendance book, and various 
other sources, including the results of performance tests specially 
devised for the purpose, the facts which bear on success in busi- 
ness college courses. The instructors were unusually codperative 
not only in making available their own records, but in giving and 
scoring performance tests and looking up the records of individual 
students. In all, undoubtedly more time was spent in obtaining 
the final criterion scores than was consumed in both giving and. 
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scoring the 32 tests. This fact is mentioned in order to show the 
amount of work necessary to determine an adequate criterion. 
These criteria undoubtedly have a high statistical reliability, 
since they contain so many items, each of which represents the 
results of much practice on the part of the student. With alow 
reliability of the criterion one cannot hope to determine the best 
series of tests for predicting ability to progress in the several 
courses. We feel that we have largely eliminated this factor in 
the stenographic criterion; the typing criterion is probably less 
reliable, and the bookkeeping still less. The order of efficiency of 
the final scales is in the same order, which might suggest that we 
need but to increase the reliability of our criteria in order to in- 
crease the validity of our tests. In fact the maximum correlation 
of our weighted scale of tests, C, with the Criterion, J, is given by 
the formula, r7¢-=V 711. rcc in which, 77; is the reliability co- 
efficient of the criterion, and rcc the reliability coefficient of the 
combination of tests. By increasing the reliability of either the 
criterion or the tests we increase the maximum limit of the corre- 
lation of our tests with the criterion, 77¢. Whether the actual 
value of r;c will always, or even usually, increase is not known 
although it seems likely that an increase in 77c will automatically 
follow an increase tn the reliability of either the criteria or the 
tests. It is apparent, however, that there are cases where there 
would be no increase, and other cases where the increase would be 
small. A test which correlates 0 with a criterion will not-correlate 
any higher no matter how much the reliability of both the crite- 
rion and the tests is increased. 

The method of deriving the four criteria scores is shown in 
Appendix IT. 


Tue SELECTION OF TESTS FOR THE I.E.R. GENERAL CLERICAL 
SCALE, C-I 


After the tests had been administered and scored according to 
the standard scoring directions, and after the criteria of ability to 
progress in the typing, stenography and bookkeeping courses had 
been computed, the correlations of the 32 tests and also of age and 
last school grade completed were computed with the three criteria 
respectively. These correlations are shown in Table XVII to- 
gether with the old time limits (that is, the time limits used in the 
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B-C Business College, B-M Business College, and B-D Business 
School) ; the new time limits (those established for the final revised 
set of tests with the exception of test 13, which is given an extra 
half minute in the T.C. edition of the General Clerical Test); the 
maximum scores on the several tests; the class intervals, (I), used 
in transmuting the gross scores to small class numbers; the 
standard deviations and averages by the old time limits, for the 
three groups, typing, stenography and bookkeeping, respectively. 

The table used for transmuting the original gross scores into 
transmuted classes, for ease of computation of correlation co- 
efficients, is shown in Table XVIII. When transmuting the 
scores of any test, say Test 1, one uses the column of Table XVIII 
which has the same class interval; thus Test 1 has the class inter- 
val 3 and one would use the (I =3)-column of the table. Gross 
scoressof 12 or 13 or 14 would each be called transmuted scores of 
5; scores of 15, 16 or 17 on this test would thus be called trans- 
muted scores of 6, and so on. 

Scoring formulae, in the case of stenographers, were computed 
for each of 31 tests (Test 3 does not have any “‘Wrongs”’ and con- 
sequently a scoring formula is impossible). Without entering 
into details, it may be generally stated that the result was quite 
unsatisfactory, none of the tests yielding enough largercorrelations 
with the criterion by the use of the scoring formula to justify the 
added labor of using a scoring formula. In many cases the scoring 
formula gave positive credit for ‘‘Wrongs”’ rather than imposed 
penalities. Speed, rather than accuracy, is significant in many of 
the tests since the scoring formula, S=R+C W, exactly equals 
“Attempts.” This does not at all mean that stenographers do 
not need to be accurate. Rather is the explanation analogous to 
the kind of handwriting required in everyday life; after one can 
write legibly there is little need for one’s improving the quality of 
his handwriting unless he is a professional copyist or letter ad- 
dresser, while increments in the speed are very desirable; thus, if 
one who can write legibly could increase his speed up to the point 
of 80 words per minute, there would be absolutely no need, in his 
case, for employing a stenographer or for using shorthand, pro- 
vided he could maintain the speed for some length of time. 

Neither does this mean that scoring formulae might not yield 
desirable advantages in increased correlation with the criterion of 
tests to be used for predicting rates of progress in bookkeeping or 
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TABLE XVIII 


STANDARD GROUPING TABLES, Usep IN TRANSMUTING UNIT TEsT 




















SCORES 
| Score Is THE CORRESPONDING CLASS ON THE LEFT 
Catt It} AND THE RIGHT WHEN THE CLAss INTERVAL, I, Is: | Capt It 
Cass 39} __________] Crass: 
I 2 3 4 5 6 
I fo) o- I (o> 2 O- 3 O- 4 O- 5 I 
2 I 2— 3 Sm 7 ali So 6-11 2 
3 2 4-5 6- 8 8-11 10-14 12-17 3 
4 3 (Sse 4 Q-II 12-15 15-19 18-23 4 
5 4 8= <9 12-14 16-19 20-2 24-29 5 
6 5 IO-II 15-17 20-23 25-29 30-35 6 
i 6 12-13 18-20 24-27 30-34 36-41 7 
8 ai 14-15 | 21-23 | 28-31 35739 | 42-47 8 
9 8 16-17 | 24-26 | 32-35 | 40-44 | 48-53 9 
EO: 9 18-19 | 27-29 | 36-39 | 45-49] 54-59 10 
II 10 20-21 30-32 40-43 50-54 60-65 II 
12 II 22-23 | 33-35 | 44-47 | 55-59 | 66-71 12 
13 12 24-25 36-38 48-51 60-64 WAH 13 
14 13 26-271 939-41 | 52-55. | 65-69 | 78-83 14 
15 14 28-29 | 42-44 | 56-59 | 70-74 | 84-89 15 
LOSE 215 30-31 | 45-47 | 60-63 | 75-79 | 90-95 16 
17 16 22-33 48-50 64-67 80-84 96-101 17 
18 17 34-35 51-53 68-71 85-89 | 102-107 18 





typing. Administration of the test will be much simpler if 
“Rights” (R) only are used as a score, and consequently the 
number of Rights has been used as the score in all the correlations 
Jater included in this report. 

At this stage the technique for determining the 1 best tests was 
not yet available. Neither was the requisite amount of time 
available for solving all the possible intercorrelation coefficients 
and resorting to the “trial and error’”’ method, which was the only 
feasible method then available for handling-a situation of such 
magnitude as that involving 34 test variables. Instead, by the 
concensus judgment of three test workers, aided by the data of 
Table XVII, the following twelve tests were selected for further 
evaluation: 

Unit Test Nos. 6, 10, 12, 17, 20, 21, 23, 24, 25, 26, 28, 32. 
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These were determined on the bases of (1) the magnitude of 
their correlations with the several criteria; (2) the fact that they 
gave positive correlations with all three criteria and required a 
minimum of revision in order to be thoroughly objective and 
satisfactory in their administration and scoring. 

At this point the tests were revised slightly by the addition of 
geometrical improvements in the test page, the changing of time 
limits and the re-wording of a very few questions which had given 
trouble previously. The changes were not of such a nature as to 
be expected to cause any marked changes in the correlations of the 
tests, either among themselves or with the criteria. 

The intercorrelations of the 12 test variables above named were 
now computed for each of the three criteria. By Dr. T. L. 
Kelley’s ‘‘trial and error’’ method, the beta weights (see Table 
XIX) .were determined through two or three successive approxi- 
mations. These were not carried out further owing to lack of 
time. 

The seven starred tests, 6, I2, 17, 20, 24, 25, 26, became the 
Army clerical tests published by the Army E and R schools in 
1921. The needed constants of the army edition are shown in 
Table XX. 


THe METHOD OF DETERMINING THE NINE ‘GENERALLY BEST”’ 
CLERICAL TESTS 


After the seven tests had been selected by the process above 
described, the writer discovered the method of multiple ratio 
correlation. Without having tested out the truth of the assump- 
tion, it was assumed that when two tests added in turn to a 
previously existing composite yield different multiple ratio regres- 
sion weights, that one which gives the higher weight would give 
also the higher multiple ratio correlation. This was subsequently 
determined to be a false assumption, since the multiple ratio 
correlation is a function not only of the correlation of the variable 
added with the previously existing composite, but also of its 
correlation with the criterion. It has subsequently been empiri- 
cally determined that the assumption here used is a close approxi- 
mation to the truth, although, by using that method, we perhaps 
did not secure the maximum possible correlation nor the same 
selection of tests which we would have secured had we used a 
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method based upon the multiple ratio correlation rather than upon 
the multiple ratio regression weight. The false assumption then 
does not mean any error in the correlations or weights secured, but 
means only that possibly the best selection was not made. 


TABLE XIX 


Last APPROXIMATION TO BETA WEIGHTS OF TWELVE CLERICAL TESTS 


























Last APPROXIMATION TO 
B-WeIcuHT FoR Criteria | REVISED Nw Aner 
Test No. TIME IN | SELECTED Tcee No) 
Typ. | Stenog. Bke. sete 
6 22 .60 — .02 2 #8 I 
10 17 .46 .06 
12 20 55 .26 33 i 3 
17 18 .40 19 5 = 5 
20 33 59 03 23 et 4 
21 10 55 07 B 
23 03 .20 .20 25 
24 Il 51 18 4s cs 6 
25 52 .46 27 3 : 7 
26 21 .48 24 25 ss 2 
28 03 .O7 27 3 
32 10 55 08 13 
x 12 N=37 | N=37 | N=49 Total Total time of 7 selected 
ests 40 min. tests=24 minutes 


The twelve tests which had been previously weighted by Dr. 
Kelley’s ‘‘trial and error’? method were subjected to the two 
following procedures: 

A subsequent application of the formula for determining 7;~ by 
giving zero weight to the variable considered! was applied in 
succession to variables 17, 12, 6, 25, 24, 26,20. This showed that 
tests 17, 12 and 6 might be immediately discarded from further 
consideration as contributing but little to the three combination 
correlation coefficients when the other nine tests are included with 
their then available weights. The results also showed that tests 


1 Kelley, T. L. Tables to Facilitate the Calculation of Partial Coefficients of Corre- 
lation, etc. Bulletin 27, Univ. of Texas, Austin, Tex., 1914 (out of print), Formula 
b, p. 23. 
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25, 24 and onwards in the present C-1 series could not profitably 
be discarded from the combination, as their elimination would 
have large effects in decreasing the values of the three 7;¢’s. 

The list of intercorrelations which were computed was increased 
finally to include tests 20, 24, 26, 28, 23, 32, 21, 10, I, II, 13, 16, 
27, 9, 25, Age, School Grade. 

1. Test 20, which had the highest average correlation with the 
three criteria, was taken to be the backbone test for each of the 
three scales which we were beginning to construct, a typing scale, 
a stenography scale and a bookkeeping scale. 

2. The multiple ratio regression weights for each of the eight 
other variables in the case of each of the three criteria were com- 
puted. By this procedure Test 24 proved to have the highest 
average multiple ratio regression weight. Accordingly, it was 
taken to be the second test in each of the three scales under 
construction. The multiple ratio correlation coefficients r;¢ 
were computed for each of the three scales. These yielded in- 
creases in the correlation coefficient in each case above the corre- 
lations which obtained between the criteria and Test 20 alone. 

3. By computing the multiple ratio regression weights of the 
remaining seven variables for each of the three criteria in turn, 
Test 26 proves to have the highest average multiple ratio regression 
weight. Again, with Test 26 included as the third test, the new 
increased multiple ratio correlation coefficient was computed for 
each of the three criteria and all gave increases in the correlations 
of the several criteria above the previously existing composites. 

4. By a repetition of the above process the other tests thus 
to be added are in turn Tests 28, 13, 9, 11, 16, 25 and School 
Grade. 

Without the inclusion of School Grade, the correlations with 
the criteria of the three scales are as follows: 


Typing scale with typing criterion............... .. Y= .62+ .07 
Stenography scale with stenography criterion....... r=.714.06 
Bookkeeping scale with bookkeeping criterion........ r=.59+ .06 


The General Business criterion was subsequently computed, and 
the General Business scale, when the tests are entered in the order 
as previously determined, correlates with the General Business 
criterion to the extent of .58+.05. With the inclusion of School 
Grade, which may be used as a test, the correlations with the 
several criteria are as follows: 
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Typing scale with typing criterion. ; ete f=. 621-407, 
Stenography scale with eeaonaphy, criterion Bn Ue Fe r= .71+.06 
Bookkeeping scale with bookkeeping criterion........ r= .64+ .06 


General business scale with general business criterion.. r= .59+.05 


The probable errors of these correlation coefficients range from .05 
to .o7. It will thus be seen that these scales in the clerical field 
compare quite favorably with scales for predicting success in 
academic school courses. The weights in some of the scales for 
some of the tests come out negative. This is undesirable in a 
general scale such as a general intelligence scale, since, if it became 
known to the pupils that credit were subtracted for a high score on 
a test, it would be unlikely that any one would make a high score 
on a test. However, in the differentiation between different 
business school courses, these negative weights are ones which are 
quite desirable for producing differential variations in the total 
fitness scores of a given person for each of the three courses. 
Negative weights are then actually desirable in the problem of 
securing differential fitness scores for various occupations or 
various school subjects. We are badly in need of a formula for 
the reliability of such weights in order to know whether such 
negative weights will remain of the same approximate magnitude 
were the weights to be recomputed on a second experimental 
group, or even whether they would remain negative on a second 
experiment. If the probable error of such a weight were too high 
we would be uncertain whether a negative weight would retain its 
negative sign upon a second repetition of the experiment. 

If further experiment were to prove the feasibility of such 
differential weights for predicting progress in different occupations 
or in different occupational courses, the added validity secured by 
such differential weights would justify our giving a much larger 
number of tests than we intend using in any one particular scale 
and then weighting the » highest tests in the case of each of the 
various courses or occupations for which we wished to determine 
the fitness of the individual. With, say, twenty tests so chosen 
that there would be ten extremely good tests for predicting ability 
in four different lines, one would score all twenty tests but would 
weight in the case of each of the four courses considered only those 
ten tests which best predicted ability in that course. The ad- 
dition, of course, of the other ten tests to each of the scales would 
add something to the predictive value but hardly enough to 
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justify the extra trouble involved in weighting them. The weight- 
ing can be easily taken care of by mechanical stencils which give 
the weighting fitness scores corresponding to all possible raw 
scores on the various tests. 

The writer has long been keenly aware of the seemingly great 
waste involved in giving a new test wheneversome little additional 
fact is to be discovered in regard to a person’s ability. In some 
school systems, where testing is quite in vogue, the pupils may 























TABLE XX] 
TABLE OF CONSTANTS FOR GENERAL CLERICAL Test, C-1, B-C BusINEss 
COLLEGE * 
Otp | New MULTIPLE Ratio 77, $-WEIGHTS 
CLp | New 
TEST Cum. | Cum. 
TIME | TIME | 
No. eee Nave TIME | TIME Gen Can 
ea it . | Min. 5 . i “| Typ. | Sten. xg. 8 
MIN IN. | Typ. | Sten. | Bkg Bue: yp ten. | Bkg ae 
20 33 24 3% oF 546 | .497 317| .425| 1.00 | 1.00 | 1.00] 1.00 
24 5 43 7 549 536 407| .474 It COD} Ex051/ 05 
26 4 3o 24 10; | .550 607 461] .520 05) |) 1205) | LerOl 283 
28 2 3 14} 134 -551 651 519} .522/—.08 |—.77 | 2.00 ors 
13 5 5 19} 18} | .587 668 551] .539 4I 46 | 1.80) .64 
9 52 52 25 24 588 | .693 | .571] .560}—.08 73 | 1.83} .84 
It 3 4 28 28 610 | .693 | .574] .568} .34 |—.05 79\ sss 
16 3 2 31 30 614 705 578| .568 21 |—.46 | 1.15 T2 
25 4 3 35 33 655 710 594 584| —.10 3t 2.19} 1.04 
School 
Grade 0) to) 35 33 NO2 Dee 7 CAG O4T |e SO] mune 7 N27 ASS | etae 
PCP eT- OF CRBES orcs iv ors nk ees cs ates ys 37 49 8r 37 37 49 81 
Sampling correction for Yro++++++-+-| -008 | .005 | .006} .004 
P.E.r of ao ange thee ENE EE Ae Nea .069 | .056 | .057| .049 
Test | Test | Sch. | Test 
20 20 Grd.| 20 
Highest [pps 0 et Dera oin Ear BOE o -546 | .407 | .484) .425 























* The Probable Error of the Correlation Coefficients in Table XXI may be found from the 
following table: 








P.E., WHEN r Is: 








N= 
(0) st I x 2 258) = 4 25.5 + .6 2a 5 +8 + .9 
37 8 | of ani .10 .09 .08 .07 .06 .04 .02 
49 .10 .10 .09 .09 .08 .07 .06 .05 .03 .02 


81 08 .08 .07 .07 .06 .06 .05 04 .03 .OL 
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TABLE XXII 


TABLE OF MULTIPLE RATIO CORRELATIONS OF THREE SCALES AS THE LESS 
UsgeruL Tests ARE ELIMINATED IN ORDER, RETAINING THE Best TEST 
FOR THE FINAL SINGLE TEST SCALE. ARMY FORM 


























Té 
TESTS IN SERIES ea 

i cee Typ. Sten. Bkg. 
20 25 546 -497 -317 
26, 20 6 .547 .596 .305 
24, 26, 20 103 .549 .605 416 
25, 24, 26, 20 oe 554 .605 528 
6, 25, 24, 26, 20 153 554 .612 .519 
12, 6,25, 24, 26, 20 19 .556 611 .537 
L720, 25) 245 20,20 24 550 611 .541 
Number of Cases* a7 Bi 49 














* For P.E., see footnote to Table XXI. 


receive as many as fifty tests in the run of a year. A composite of 
all of these fifty would have predicted any one of the facts which 
any of the scales was aiming to determine very much better than 
any single one of the scales. Any twenty-five of those fifty tests, 
selected at random, would undoubtedly have predicted any of 
those abilities better than any two of the scales used. Knowing 
in addition one’s reading and arithmetic ability, one can predict 
better his ability to get along in a vocational course than if he 
merely knows his score on some mechanical test such as the 
Stenquist Assembly Test. The Stenquist Assembly Test prob- 
ably correlates higher with success in many mechanical courses 
than any single one of the tests which we might give, but the 
Stenquist test plus any kind of intelligence test will assuredly tell 
us more about ability to progress in a mechanical course than will 
the Stenquist Test alone. Likewise, the school records which 
have accumulated during the school career of pupils, if available 
in the 7th or the 8th grade, may have very high value for predict- 
ing their further success either in industry or in continued school 
work. This is shown under the discussion of this topic in a 
previous section of this report. 
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Tue B-D Army Business SCHOOL RESULTS 


In order to determine whether the nine ‘‘generally best” tests 
and School Grade, ten variables in all, selected upon the results 
from the B-C Business College, would yield reliable predictions 
with the weights given them, when applied to a different business 
school group, the results of these ten variables were worked up 
on the B-D business school soldiers who took the thirty-two busi- 
ness school tests at the same time as the B-C Business College 
students. The tests were given under standard conditions by the 
camp examiner, Mr. F. A. Moss. The pupils were rated four 
times independently on the traits defined below by their instruc- 
tors on a scale of 1 to 5 “lowest, low, average, high, highest.’’ 
The instructions for the ratings were typewritten and handed to 
the instructors, and for the four ratings in the subjects of stenog- 
raphy, typing and bookkeeping were as follows: 


Stenographers: 


a) First ranking: Include in your judgment of merit the factors 
which you usually take into consideration as contributing to 
progress in stenography. 

b) Second ranking: Consider here only the readiness and 
accuracy with which the pupil grasps, remembers, and employs 
the theory of stenographic characters. 

c) Third ranking: Consider only the interest in the work, 
ambition to succeed, and general class morale. 

d) Fourth ranking: Consider exactly the same factors that you 
used in the first ranking. 


Typists: 

a) First ranking: Include in your judgment of merit the factors 
which you usually take into consideration as contributing to 
progress in the acquirement of skill in typing. 

b) Second ranking: Consider here only your impressions of the 
order in which you would choose the pupils if you wanted a faith- 
ful copy of matter submitted, handed in promptly. Record the 
order. 

c) Third ranking: Consider only interest in the work, ambition 
to succeed, and general class morale. 

d) Fourth ranking: Consider exactly the same factors that you 
used in the first ranking. 


Bookkeepers: 


Follow the same general principles as are laid down for securing 
data for criteria scores for stenographers and typists. 
1. Secure as many grades of all sorts as are available. Any 


uf 
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numerical grade is valuable if reported for all members of the 
class. Grades reported in terms of ‘‘Excellent,’’ ‘‘Good,”’ etc., 
are of considerable value as are also letter grades of “A,” “B,”’ 
“C,” etc. All are of more value if taken at different intervals of 
time throughout the course. Each individual grade series should 
be reported. 

2. Ratings of speed and accuracy should be reported separately. 

3. Secure independent judgment rankings in order of merit by 
the method described above, using the following central theme as 
a basis for judgment on each occasion: 

a) First ranking: Consider general ability to make progress in 
acquiring a mastery of the course. 

b) Second ranking: Consider the probability that the pupils’ 
books will “‘ prove up”’ on any trial balance, or in other words that _ 
his books are accurate. 

c) Third ranking: Consider only interest in the work, ambition 
to succeed, and general class morale. 

d) Fourth ranking: Consider exactly the same factors that you 
used in the first ranking. 


With these ratings arranged in parallel columns, it was apparent 
that the standard deviations were approximately equal in the four 
columns of ratings of each subject. Accordingly, the gross 
measures of the four ratings were summated, which procedure 
amounted approximately to obtaining the average position in 
the four rankings. This combined score became the criterion 
in the courses in typing, stenography, and bookkeeping, respec- 
tively. The weights and standard deviations used for the ten 
variables respectively are shown in Table XXIII. The corre- 
lations between the test scores weighted by each of four weightings 
and these criteria were computed and are given in Table XXIV. 

There is every reason to believe that the criterion here is very 
much less reliable than that at B-C business college; consequently, 
even lower correlations of the scale with the criterion would be a 
substantiation of the validity of the tests for measuring ability to 
progress in acquiring typing; stenography, and bookkeeping. 

Theoretically, it would be better to use, in determining the 
business school gross score weights, the 6’s previously determined 
and to use the o’s of the new groups. This was not done because 
of lack of time, and also from the consideration of the fact that any 
scale for wide use will probably have to have predetermined gross 
score weights. If all of the standard deviations changed propor- 
tionately on the several tests upon a change in the level of ability 
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TABLE XXIII 


THE o’s AND M’s OF SELECTED 9 TEsTs, OLD (B-C BusINESS COLLEGE) TIME 
Limits. B-C Business COLLEGE 


Typing: N=37; Stenog.: N=37; Bkg.: N=49; Genl.: N=81 





















































o’s (OLD Time) M’s (OLD Time) 
TEST 
No. 
Typ. | Sten. | Bkg. | Genl. | Typ. | Sten. | Bkg. | Genl.* 
20 Tse Oul LOnS 4s l5e590|) LO50 70/6847 60-30) 1156262) || ea 
24 3.76 Bais 4.18 4.073] 10.97 | 10.97 QROS) |e 
26 Q .64 9.70 |} 10.18 OpS4Olse2FOZn 6223016322208) see 
28 8.66 8.19 8.19 SE OO2 |e 4ulml 7.0 Onl el OrO La herein 
13 2.61 2m 50 PR} 2.797| 6.65 6.40 6.96 
9 LOPZ2Su | eLOnl 7, 8.56 9.136] 30.27 | 29.38 | 28.55 
It 3-99 4.34 | 4.74 | 4-367] 27.31 | 26.77 | 24.42] ..... 
16 6.14 5.79 6.19 SEO TON Re Aal eso 5 1 SOO) |e cmen- 
25 6.65 6.46 6.10 i AAT} UIC NS |] TC OS Ih BaGoH! Ih Go ao oe 
SG 1.64 167) 1.81 1.767| 10.11 | 10.03 | 9.46 
B's Gross Score Wess 
TEST 2 
No. diy pal motenen le oko Genlen| a bypanoten. |) Bko se iiGenlks 
20 I .00 I .00 I .00 I .00 .O741} .0612] .0641 | .0622 
24 Lt 665-105 .65 | .0293| .1769] .2512 | .1596 
26 .05 1.05 1.19 83 .0052} .1082] .1169 | .0843 
28 — .08 | —.77 2.00 .15 |— .0092|— .0940] .2442 | .0186 
13 41 46 1.80 .64 .1571| .1797| .6143 | .2288 
9 — .08 73 31 bahet .84 |—.0078] .0718] .2138 | .o919 
II Gul || = OR -79 55 2O852| = OLT5 | L007) 16.1250 
16 220 AG 1.15 a2 .0342]— .0794] .1858 | .0204 
25 —.10 Soy 2.19 1.04 |—.OI150} .0480] .3590 | .1659 
SG Py SB || AL oeye .42 .1646] .1617|2.6685 | .2377 





* Not computed for lack of time. 


of test subjects, the relative importances assigned to the several 
tests would remain constant. The variabilities probably do not 
vary enough from proportional changes to make any marked 
effect in the final results. The likelihood of the above statements 
being true is enhanced by the consideration of the fact that, with 
the relative weights of tests approximately correct, fairly large 
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TABLE XXIV 


CORRELATIONS OF TEN VARIABLES, WEIGHTED BY EACH OF FOUR SERIES OF 
WEIGHTINGS, WITH THE CRITERION SCORES OF STENOGRAPHY, TYPING 
AND BOOKKEEPING, RESPECTIVELY * 











CORRELATIONS WITH THE RESPECTIVE CRI- 
TERIA WHEN WEIGHTED BY THE MUL- 


TIPLE RATIO WEIGHTS PREVIOUSLY NUMBER 
CRITERION DETERMINED For: OF 
CASES 
Stenog- Book- General 


Typing raphy keeping | Business 





ly pinghn sete core 2G) 34 22 18 49 
Stenography...... 79 87 73 79 8 
Bookkeeping...... 42 152 m2 .28 19 


* The Probable Error of the Correlation Coefficients in Table XXIV may be 
found from the following table: 














changes in the weights, if made at random, will have but a small 
effect on the multiple ratio correlation. This theorem allows one 
to use integral gross score weights, which are but good approxi- 
mations to the true relative gross score importances, with but 
little variation of 77~ from its maximum amount. 

After the twenty-eight page folder of tests (to be described 
later) had been administered to the Company I employees and the 
correlations had been computed, it became evident that Test 1, 
Arithmetic (Correct and Incorrect Additions and Subtractions, 
the requirement being that errors be checked) was one of the best 
tests in predicting the criterion scores in Company I. It was 
added as an additional test at a gross score weight of 3, which 
gives it an importance about the same as that of Test 9. This 
addition may not add materially to the multiple ratio correlation 
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coefficient. Since, however, it is placed as No. 1 of the ten tests 
of the I.E.R. General Clerical Scale, it acts as a good ‘‘buffer 
test.” The requirements are readily grasped by any clerical 
worker and three minutes spent on this test serves to allay any 
nervousness which the test subject may have. Even though the 
addition of such a test does not add materially to the validity of 
the test it may add something to its reliability. Its arbitrary 
inclusion, therefore, seems quite justified. The nine tests hitherto 
described, plus this new test, became the I.E.R. General Clerical 
Test, C-1, sometimes referred to as the Toops Clerical, or Toops 
Business, Test. 


THE RELIABILITY OF THE I.E.R. GENERAL CLERICAL TEST 


The I.E.R. General Clerical Test, Form A, used throughout 
this investigation, was given by Mr. Luton Ackerson in April 
1922 to seniors (3-year course) in business classes in the Julia 
Richman High School. Form B was given the following June 
to the same pupils. There were 145 pupils who took both forms. 
The tests were weighted with the general gross score weights: 


Wt. Wt 
UES DG Se see Cece Loe 3} LOStt2OG Ry. rpc aeacee ne 2 
BLESten Opera wet oie RB MDOSt ED Aree ee ences esO 
EREStMDICe ape eivitrs oa) cic 3 SREStO5 a en eee 
SRESERT Sey kets testi 10 EREStZ2 OVEN oe akc eee 2 
SheshstOmin So cds & 4 Testes Ree hal tae: ES 


The correlation between the total score of the first and second 
giving is .82.+.02. 

Inasmuch as a high school business group in the senior year is a 
highly selected group, it seems quite likely that the reliability of 
the scale when applied to a thirteen- or fourteen-year-old group 
would be substantially of the same order as the reliability co- 
efficients of the well-known intelligence scales. The average 
score on Form A was 71.7 and on Form B, 81.0, or a gain of 13 per 
cent of the second giving upon the first. It is obviously impos- 
sible to state to what extent this gain is due to an initial lesser 
difficulty of Form B over Form A. The variability is about the 
same in both cases, the standard deviation being 11.3 for Form A 
and 11.5 for Form B. 

1 This weight refers to the original form of Test 1, where the incorrect additions 


only were checked; in the new printed form where both correct and incorrect 
additions are marked, the weight of Test 1 is 1}. 
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CORRELATION OF I.E.R. Test C-1I witH SuccESs IN ACTUAL 
CLERICAL WorRK 


We have shown that the selected set of nine (or ten) tests is 
predictive of success in business schools. We have now to de- 
scribe its value in diagnosis and prediction of success in actual 
clerical work. 

As always, the only decisive experiments will be those where the 
tests are given to young people whose later careers are followed. 
However, we may profitably study individuals already engaged in 
clerical work, comparing the scores they make in the tests with 
their demonstrated success ‘‘on the job.” 

A valid measurement of demonstrated success in clerical work is 
not easy to find and apply. Probably the most reasonable 
measure to use would be salary (per unit of time) attained at 
equal age after equal length of experience, with some allowance for 
pleasant or unpleasant conditions of work. We have been un- 
able to find any group available for test whose individuals could 
be so measured. The next best criterion would be a ranking in 
order of success given by superior officers who would allow, at 
least roughly, for age and experience, and who would consider 
salary, attractiveness of work, and promise of promotion due to 
the quality and quanity of work being done by the individual in 
comparison with those of like present salary. We have been 
able to obtain a criterion approximating to this in the case of 73 
clerical employees in Company W. We had also the very great 
advantage of being able to test these individuals with the later 
constructed I.E.R. Test C—2 and the Stenquist Assembly Test. 

The opportunity to give the tests to these employees of Com- 
pany W came, however, only near the end of our year’s work. 
In the meantime, we secured such data as we could; and it seems 


best in this report to follow roughly the chronological order of our 
work. 


TESTS OF COMPANY I EMPLOYEES 


The group of ten business tests chosen for the final series, and a 
number of other tests in addition, were given to 301 employees of 
Company I. The efficiency ratings on two of the subjects were 
not available at the time, so that the report on the combined 
groups below consists of only 299 cases. Four alternative forms 
of Unit Test 20, Vocabulary, were assembled and given to these 
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subjects. Three forms of Test 28, Codrdinates, were also pre- 
pared and given. In addition to the series of ten tests of C-1, 
five forms of the Woodworth-Wells Number Checking, Test 39, 
were provided and given, each form being given with a two- 
minute time limit. Three forms of finding addresses, Test 40 
(Test 6 of the Thorndike Non-Verbal Clerical C—2, but differing in 
that the C-2 series has a six minute time limit) were prepared and 
each form was given with a time limit of three minutes. A test of 
writing numbers, Test 41, in squares of uniform size, } inch 
square, beginning with 11 and continuing upward serially, was 
given with a two-minute time limit. Test 18, fromthe Army 
Beta, Same-Different Numbers, was given with a three-minute 
time limit. The test of alphabetical filing of names in the revised 
form (Unit Test 23) involving the idea originally used in the 
Thurstone name-filing test of his clerical series, was given in two 
forms with a time limit of 2} minutes each. The test was 
changed so that the subject had merely to write on the blank 
preceding each name in the work column the number correspond- 
ing to his name as found in the alphabetical column. This 
changes the test somewhat from Thurstone’s original test but 
makes it more readily scorable by stencil. And, finally, the last 
test was the Company I form of same-different numbers and 
names originally modeled after forms used by Thurstone, Thorn- 
dike, and Army Beta. This test is referred to as ‘‘ Page 28”’ in the 
intercorrelation tables. The employees had ail been rated by 
their superintendents some two or three weeks previous to the 
giving of the test in the routine periodical rating which is taken 
every six months by this firm. The rating scale used is the Com- 
pany I revision of the Scott rating scale plan adapted to clerical 
employees. The company has an elaborate set of tables in which 
the wages of employees are supposed to be regularly advanced 
according to the increase in ratings received on this rating scheme. 
The detail with which these tables have been made out, and the 
fact that psychological tests and ratings have been a fundamental 
part of the personnel work of this firm for a number of years, led 
us to believe that in all probability the ratings of these employees 
were as accurate as were to be found anywhere in industry. 
This group of clerical workers was chosen for that reason and for 
the additional one of the hearty interest in the tests evidenced by 
the management. 
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Without going into the laborious procedure of weighting each 
test according to the exact weights determined from the B-C 
Business College group, the arbitrary weights of the following 
table were assigned to the several tests, resulting in the approxi- 
mate gross score weights, hereinafter called ‘‘ gross score weights,” 
Of 3;.3,. 3,10, 4; 2, 6, 4,2, Ty forvlests:1, 9, 11, 13; 16, 20,2425, 
26, 28 respectively. 


TABLE XXV 


DERIVATION OF ARBITRARY Gross ScorE WEIGHTS OF TEST C-I 
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* When only one page (one form) of Test 20 is given, it should be given a gross 
score weight of 2 in order to maintain its relative importance; similarly, when only 
one page (one form) of Test 28 is given, it should be weighted 13. The general 
principle is that with 7 pages weighted W, one page should be weighted »:W in order 
to maintain its importance relative to the other tests. 


The 299 cases having complete records were divided up into 
four clerical groups based on similarity of name of occupation. 
This grouping does not necessarily mean a classification based 
upon the greatest similarity of work. The groups are: 


Regula Business se. eee mei ere re ete ere me OOLCASES 
ERY DISES sieves. ey hutaiate fas ee a eaten Bee Se aoe Ea een OFCASES 
Pile: Clerks. cumin ges. Sous ee Relea eee 32 cases 


District, Transfer, and All Others not Otherwise Tabulated 149 cases 


SP Ota reece, ohecetae he eer ota TE EL eee ee eee OOLCASES 


The regular business clerks are for the most part those who do the 
odd-job clerical work and are the poorest paid. The typists and 
file clerks for the most part have a more routine or specialized type 
of work than is the case in most firms. Those district and trans- 
fer clerks who work on specialized clerical work require in general 
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more ability than is required of the regular business clerks. With 
this group, however, is combined a motley group of “‘all others” 
about whom nothing in general can be said more than that they 
are occupied on specialized tasks each requiring no more than I 
to 4 or 5 persons in the entire establishment. The following cor- 
relations result: 


Typists and Regular Business Clerks! (118 cases) Criterion r 
ANN Velie RLECL SCORES ware pera e or musi wns itn cats .07 + .06 
Regular Business (90 cases) Criterion and Weighted Scores .15-+ .07 
Typists (28 cases) Criterion and Weighted Scores ........ 02+ .13 
District, Transfer (149 cases), and All Others Criterion 
and Weighted Scores. Pace re 03-05 
File Clerks (32 cases) Grtecrion ard Weighted Score eos pies 1: 


In an effort to discover whether length of service had any effect 
on the criterion (for if it has then it must be taken into account in 
the criterion used for testing the tests), the following correlations 
were computed: 


Typists AND REGULAR BUSINESS COMBINED (118 cases) 


i 


Criterion and Nine Weighted Test Scores?. ee. O71 06 
Criterion and “ Rated after Months” Lee of Specie 

OLD) nats. to-d aera oo caeRne Bear OCS cn NCIC NCE cee RS eae .20+ .06 
Criterion and “Length of Time with Firm”............ .07 + .06 


Weighted Total Nine Tests? and Length of Time with Firm .10+.06 


Thus, as shown by the low correlation of criterion and experi- 
ence (either length of time with the firm or length of time on the 
specific job), the criterion is not badly affected by experience 
attenuation, at least with the two groups thus here combined. 
Yet time on the specific job is a better predictor of criterion score 
than the nine tests. Neither, of course, are high enough to be of 
any value as tests. 

Dr. Thorndike computed the correlation between the old Com- 
pany I examination devised by him in 1914 and the official 1921 
rating in the case of twenty-four clerks examined in 1915 and at 
work in 1921, with the result: y= .38+.12. This does not neces- 
sarily indicate (with this ) a higher relationship than on the nine 
selected tests. In the case of the 32 (larger NV) file clerks, rated 


1 Typists have highest criterion scores and regular business lowest; therefore this 
combined group should give the maximum correlation. 
2Test 1, Arithmetic, had not yet been added to the C~1 Clerical Series. 
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by one supervisor, we have four separate tests which correlate 
with the filing criterion to a greater extent than .30, the highest 
single test correlating with the criterion to the extent of .48. 
Some three months previous to the giving of these tests, the 
management of Company I decided to revise the examination 


TABLE XXVI 
CORRELATIONS OF VARIOUS TESTS WITH COMPANY I RATING 








CORRELATION WITH CRITERION OF: 


Test NAME Wate Dist. All o 
Test No. | Reg, “ File | Transfer : 
Bus. Ae Clerks | and All eben 
Ochers Combined 

Arithmetic Chee oi autos I 227 Fa $9 .21 .04 .09 8.8 
Digit Symbol'.. ..6..45: 9 .03 ay ef .07 .08 —.02 9.8 
Number Copying....... II —.03 —.18 .09 -07 .03 8.0 
Fruit Tabulation....... 13 .20 .00 .00 .04 .07 254 
Holley Vocabulary...... 16 EE .04 x7 .03 .08 Sad 
Thorndike Vocabulary...} 20(1-4) -14 +23 -15 .Or .OI 56.8 
Reading Backwards.....| 24 eZ —.1I5 —.30 .07 —.05 Le 
Business Information....} 25 —.02 —.02 .24 —.10 —.06 4.6 
Number Circling........| 26 .17 —.18 .31 —.II —.Oo1 9.5 
Cadrdinates” sansa 28 (1-3) .15 .10 $31 .00 —.o1 28.4 
Woodworth Number 

Gheckingaperree arene 39(1-5) 513 .05 09 —.02 OI 73.8 
Finding Addresses...... 40(1-3) .07 me j -03 .06 -Or > gee 
Number Writing....... 4l Cr —.07 —.10 -05 .00 13.0 
Same-Different Numbers] 18 «15 -Or 07 —.07 —.04 6.5 
NameRiling..-. ..¢-e- 23(1-2) |—.02 —.16 -34 -E2 .03 14.0 
Company I Same-Diff...| Page 28 .05 Pe .48 .04 -04 11.8 
School Grade Completed] ...... .16 25 .O4 .02 .00 
WumberofiCasesy..i2 mi ree 90 28 32 149 209 





* The Probable Error of the Correlation Coefficients in Table XXVI may be found from 
the following table: 
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which had been prepared by Dr. Thorndike for them in 1914. 
The new examination did not take nearly so long to administer as 
the earlier form of the examination and is given in omnibus form 
with an over-all time limit restriction of 75 minutes. This form of 
examination had been given to 307 cases whose ratings were 
available for comparison with the new examination. The corre- 
lations were worked out by Dr. Thorndike with the result that the 
correlation of the present new examination of the Company’s test 
and new rating (rating used by us) is .16.04 with N = 307 cases. 

In order to determine whether possible coaching outside the 
test room was responsible for the low correlations, the correlation 
between the weighted nine-test composite score and criterion for 
the first (November 30, A.M.) section to take the tests (NV =77) 
was computed, with the result: y= —.05+.08. Ther would be 
expected to be attenuated by coaching only in the case of later 
subjects, since the first group might communicate information to 
the later groups. Hence, we conclude that coaching was not a 
factor in producing the low correlations. 

The correlations of the several tests, all groups combined, or 
299 cases, with the criterion gives the fifth column of Table XXVI. 
This table shows as well the correlations of the separate tests with 
the criterion in the case of the four groups: A. Regular Business 
Clerks (90 cases); B. Typists (28 cases); C. File Clerks (32 cases) ; 
D. District Clerks, Transfer Clerks and all others not tabulated 
in A, B, or C (149 cases). 

The correlations of last School Grade Completed and the crite- 
rion in the case of the four groups and the total group are as shown 
on the last horizontal row of correlations in the Table XXVI. 

It will be noted that none of the correlations above reported are 
of sufficient size to indicate that any of the tests given are of 
practical value for predicting the criteria of the separate groups or 
total group. A few of the single tests have correlations high 
enough to be of practical value, but there is no assurance that 
these would obtain upon the second giving of the tests. 

Much might be written in attempted explanation of these 
extraordinary results, but it seems best to say nothing, since we 
were unable to investigate the criterion itself. 
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Tue CORRELATIONS OF THE GENERAL CLERICAL TEST WITH 
OTHER TESTS IN THE CASE OF SCHOOL CHILDREN 


Shortly after the experiment with the employees of Company I, 
we conducted an extended series of experiments with school 
children, which were designed to reveal the suitability of our 
different tests for use with children, and the degree to which the 
different tests did measure different abilities. The General 
Clerical Test C-1 was satisfactory in respect to ease of giving and 
scoring and suitability for children of the ages in question. But 
it became clear from the correlations (cf. Table V, p. 22) that, in 
the case of these children, it did mot measure an ability much 
different from that measured by any standard test of general 
intelligence. The General Clerical test correlates nearly .80 with 
the Arjth.-Re. test, and correlates with ‘‘Half-year Gains” and 
“Average Work”’ as closely as the Arith.-Re. test does, approxi- 
mately .60. 

It may, therefore, be that the correlations of about .70 found 
between the General Clerical Test (when properly weighted) and 
success in business school work in stenography, typing, and book- 
keeping are due to its value as a test of ability to deal with ideas 
rather than to its value as a test of ability to deal with clerical 
items and procedures. Or these two abilities may be more 


TABLE XXVII 


WEIGHTS OF THE SEPARATE TESTS OF THE IJ.E.R. Test C-2 
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nearly identical than we have supposed. On the other hand, the 
correlation of .80 need not preclude a considerable independence.? 


ae Re Dest C=2 


These facts led us to provide the Routine Clerical Test C—2 of 
abilities presumably less intellectual than those included in the 
General Clerical Test C-1. 

Clerical Test C—2 is made up of six tests which were found by 
Thorndike to correlate with clerical ability, four of which were 
found by McCall to correlate very slightly with general intelli- 
gence in a grade population. 

It seemed desirable, in the opinion of Thorndike, to weight the 
tests according to the importances shown in Table XXVII. 

The seventh and eighth grade papers of Public School J were 
available for determining the relative variability of the tests. 
The approximate Q’s were determined by inspection of the dis- 
tributions, Tests 1, 2, and 3, all cancellation tests, being added 
together for convenience. These yield the final gross score 
weights respectively of 1, 4,5, and 10. The scores weighted with 
these weights are recorded in the master data books as ‘‘Thorn. 
Wtd. Cler.”’ 

The distributions of the 7A—8B inclusive pupils’ scores by tests 
are given in Table XXVIII. 

These distributions show that the time limits are quite satis- 
factory. The directions for Tests 4 and 5 are seemingly quite too 


1For, let us suppose that a good business test, which it would be practically 
possible to construct, correlates to the extent of .80 with a valid clerical criterion, 
and that this test correlates with general intelligence to the extent of .62. We may 
determine the maximum correlation of the intelligence test with the clerical criterion 
by assuming that the good clerical test plus the intelligence test properly weighted 
predict the clerical criterion perfectly; or, call the correlation of the intelligence test 
with the criterion, 77; and then, 


(eS 
1.00 = 
I—(.62)? 





whence, tii= .967. 
The intelligence test will correlate the minimum amount with the criterion when its 
addition to the clerical test adds nothing to the efficiency already possessed by the 
scale of one test, namely the clerical test, or, 
eee 
Biss 
I—(.62)? 








whence, r7;=.496. That is, with the two correlations given as assumed, the corre- 
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difficult, since 20 per cent of the seventh and eighth grades of 
Public School J make zero scores on Test 4, and I1 per cent on 
Test 5. Because of administrative difficulties, Test 5 has been 
eliminated from the form to be used in the 1922-23 investigation. 
An alternative form of the five remaining tests is being con- 
structed. 

Test C-2 was given to the 318 girls in the experiments with 
school pupils. Its correlations with other tests appear in Table 
V. It correlates much less closely with Arith.-Re. (about .65) 
than C-—1 does, and somewhat less closely with School Success 
(about .45) than C-1 does. Since, as we shall see later, it corre- 
lates almost as closely as C—1 with success in actual clerical work, 
it has been retained as a part of the total testing plan. 


lation of the intelligence test with the clerical criterion must lie between the limits 
of .496 and .967. If intelligence should correlate as low as 


Clerical criterion 


Clerica 
Test Intelligence test 


062 


-496 with the clerical criterion, then it measures nothing not already measured by the 
clerical test, which by comparison, r =.80, does measure many specific clerical 
abilities not measured by the intelligence test. 
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TABLE XXVIII 


DISTRIBUTION OF SCORES OF SEPARATE TESTS OF TEST C-2, IN THE CASE 
OF 201 PUPILS oF 7A TO 8B GRADES INCLUSIVE. PuBLIC SCHOOL J 














TEsTs 1+2+3 TEST 4 TEST 5 TEST 6 

Score Fr. Score Fr. Score Bye Score Fr, 
70— 76 I ro) 39 0) 22 to) 33 
¥iti= tox) C0) I B I 5 I 8 
84- 90 I 2 6 2 4 2 8 
9I- 97 4 3 3 3 3 3 20 
98-104 10 4 5 4 8 4 19 
105-111 7 5 3 5 II 5 20 
112-118 9 6 3 6 5 6 24 
Ii9-I25 7 7 8 7 6 7 21 
126-132 13 8 13 8 6 8 16 
T33Et39 15 9 Io 9 6 yo 20 
140-146 19 10 21 10 9 10 15 
147-153 20 II 15 II Il II 12 
154-160 15 2) 19 12 9 12 8 
161-167 14 13 14 13 19 1a 3 
168-174 19 14 15 14 7 14 I 
175-181 17 15 10 15 9 15 fo) 
182-188 8 16 10 16 II 16 I 
189-195 5 17 3 17 9 17 I 
196-202 4 18 ce) 18 9 18 oO 
203-209 93 19 fo) 19 4 19 I 
210-216 fo) 20 oO 20 6 — 
217-223 I 21 fe) 21 I 201 

— 22 I 22 fo) 

201 23 I 23 2 

— 24 2 

201 25 5 

26 2 

27 3 

28 2 

29 I 

30 4 


| 
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CHAPTER VI 
FURTHER EXPERIMENTS WITH WORKERS 


Tue ReEsutts oF Tests GIVEN TO EMPLOYEES OF COMPANY W 


Three tests (C-1, C-2, and the Stenquist Assembly Test) were 
administered to a selected group of 73 employees of Company W. 
This company is a large woolen goods manufacturing concern, 
and consequently has many clerical employees who do routine 
work in checking up the progress of patterns and orders through 
the mill. 

The subjects for testing were selected from the entire clerical 
group.of approximately 250 persons, as representing all degrees of 
clerical ability employed in the company. The subjects were 
rated by their immediate supervisors in rank order of ability. 
After the several rank orders had been obtained, they were com- 
bined into one rank order by the assistant personnel manager on 
the basis of a knowledge of the ratings of those persons who were 
known to two or more supervisors. The over-all ranks were 
transmuted to o’s and these to convenient integral positive 
numbers by the arbitrary formula, J=7.5+3o0 the decimals of J 
being dropped. This formula does not change the correlations, 
but yields J scores ranging from 0 to 15. This over-all trans- 
muted ranking of the entire group of employees is known as the 
“‘over-all criterion.’’ In the main office, there were 37 employees 
who were well known to one supervisor. The ratings of this 
supervisor will be known as the “‘ main office criterion.” 

The C-1 Clerical Test was administered to all the employees 
under standard conditions. The C—2 Test was also administered 
to all of the employees, but through a mistake in the adminis- 
tration, the time limits of Tests 1 and 2 were increased each by 
one minute, and that of Test 3 by two minutes, so that the total 
test was given a twenty-six-minute time limit instead of the usual 
twenty-two-minute time limit. The Stenquist Assembly Test 
was administered to all but twelve of the employees. Because 
of lack of time, one group of the two groups tested was allowed to 
spend only twenty-two minutes on the Stenquist Assembly Test. 
The scores of these subjects were increased by arbitrary scoring 
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formulae to approximately the scores the subjects would have 


attained had they been allowed to take the test for the full thirty 
minutes.! 


The correlations with the two criteria,—the general office crite- 
rion and the over-all criterion, the latter of which is further sub- 
divided into men and women and both combined,—of the I.E.R. 
Clerical Test C-1, the Thorndike Clerical Test, the Thorndike 
Clerical Test with Test 5 taken out, C-2, and the Stenquist 
Assembly Test, with the scores corrected in the case of the short 
time limit group (this test likewise being subdivided into men and 
women, in addition to bothcombined), are shown in Table X XIX. 


1The Method of Supplying Scores on the Stenquist Assembly Test in Company W. 
One group of subjects at Company W was given the Stenquist Assembly Test with a 
time limit of twenty-two minutes instead of the usual thirty minutes. It became 
necessary to make an adjustment of such scores to make them as comparable as may 
be with those given with the full-time limit. 

The ideal method in such cases would be to give to some group the Stenquist test 
and to determine their scores at the end of each successive five minutes. By corre- 
lating the scores at each of these points with the total score one could determine the 
proper regression equation to use, along with interpolation, to determine a good 
approximation to the score which would be obtained in the total time. This corre- 
lation coefficient naturally becomes higher and higher as the total time limit is 
increased, ultimately becoming a perfect correlation. Consequently, if the time at 
which the shorter-time group stopped the test is in the neighborhood of the total 
time, there will be very little regression of the shorter-time scores upon the average of 
the 30-minute test scores. 

In this case such regression equations were not available. The plot of the scores 
of men and women separately shows a very marked lack of overlapping. After 
correction by the method below, the average Stenquist Assembly Test score of the 
men was 67.1; that of the women was 36.7; the difference 30.4, is 7.1 times the 
standard error of the difference. This standard error of the difference is 4.3. This 
fact is noted at this point because so large a difference between test scores of men 
and women, working for the most part at the same type of work, is rarely found. 
The overlapping between men and women on the Toops Clerical C—1 and the Thorn- 
dike Clerical C—2 is almost perfect. It seems certain that when people are stopped 
at 22 minutes on a test there will be greater improvement in the remaining 8 minutes 
in the case of those people who make above average scores in 22 minutes. Our 
arbitrary correction formula should take account of this fact. The average score 
made by the men who took the test for 30 minutes was 117 per cent of the score made 
by the men who took the test for 22 minutes; similarly the average score for the 
women who took the test for 30 minutes was 116 per cent of the score made by those 
who took the test for 22 minutes. The desired differential between the ‘‘above 
average”’ and ‘‘below average” people can be roughly secured by the following 
formulae which we have adopted. 

Arbitrary formulae for raising the 22 minute scores. 

To correct the men’s scores: 


Raised Men: 117.24% X 22-min. score +5% if 56 or above in score = 122.24% 
—5% if 55 or below in score = 112.24% 
To correct the women’s scores: 


Raised Women: 115.62% X 22-min. score +3% if 28 or above in score = 118,62% 
—3% if 27 or below in score = 112.62% 
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In the case of the entire group of 73 men and women, the C-1 
clerical correlates with the over-all criterion to the extent of 
.40+.07, and the C-2 clerical (with Test 5 out) to the extent of 
.38+.07. The two combined by regression, give 77¢~ = .42 +.07. 

Inasmuch as a computation of the correlations both with the 
rank orders and with the transmuted criterion scores (or scores 
which were changed into terms of amount) gave results about the 
same in each case, it seems scarcely worth while to make similar 
transmutation for the cases who were rated by one supervisor in 
the main office. Correlations of the tests with the rank orders of 
the main office criterion are substantially of the same magnitude 
in this group as for the over-all criterion group, with the exception 
that the Stenquist Assembly Test correlates .51+.08 with the 
main office criterion and only .36+.08 with the over-all criterion. 
The main office criterion correlates with the over-all criterion to 
the extent of .87+.03, with N=37. 

These are our most important and most trustworthy results 
from workers. They show a moderate correlation between both 
Tests C-1 and C-2 and actual success on the job (.40 and .38). 
Test C—2 prophesies success in work in this group a trifle better 
than it prophesies average work or half-year gains in the case of 
school children. Tests C-1 and C—2 measure somewhat different 
abilities, the correlation between the two being .71 plus whatever 
increment should be added because of attenuation. The Sten- 
quist Assembly Test measures notably different abilities, corre- 
lating only .22+.08 and .06+.08 with Tests C—1 and C-2 respec- 
tively in the case of the entire group. The Stenquist Test is 
nearly as significant of success among these workers as is either 
clerical test, its correlation with the over-all criterion being .36 + 
.07 in the case of the entire group. This substantial equality of 
the Stenquist Test with either clerical test in signifying success 
among these office workers is puzzling, but the only satisfactory 
way to explain it is by further experimentation. 


Tests GIVEN TO TWENTY-ONE COMPANY O APPLICANTS FOR 
ADMISSION TO THE FOREIGN TRAINING CLASs SCHOOL 


At the invitation of the personnel manager of Company O the 
following tests were given to twenty-one applicants for the 
Foreign Training Class: Army Alpha, Toops’ Business Test 
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TABLE XXIX 


THE INTERCORRELATIONS OF THE TESTS GIVEN TO EMPLOYEES OF 
CoMPaANy W * 
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* The Probable Error of the Correlation Coefficients in Table XXIX may be 
found from the following table: 








P.E., WHEN 7 = 
N 
Oo +t |/+,2 2558 +.4 5 + .6 ey + 8 25,50) 
37 RTT Bi afi .10 .09 .08 .07 .06 04 .02 
56 09 09 09 08 .08 07 06 05 03 02 


73 08 08 .08 .07 .07 .06 .05 .04 .03 .02 
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C-1, Stenquist Assembly Test, Stenquist Picture Tests I and II. 
In addition, the following three social facts were obtained for 
each individual: Father’s occupation, age at last birthday, and 
last grade completed in school. Father’s occupation was given 
scores on the following basis: 
Unskilled or laborer 
Agricultural 
Skilled trades 


Business or clerical 
Professional 


ABW NH 


The scores on Form I and II of the Stenquist Picture Test were 
combined by adding the gross scores. 

In addition to the examinations given by the Institute each 
applicant took a very extended examination prepared by the 
company. This examination consisted largely of practical prob- 
lems such as foreign representatives of the company might 
presumably be expected to have to deal with in their foreign 
business relations. The practical problems covered the following 
aspects: 


Interest on money Freight 

Tabulation of statistics Economical operation and costs 
Areas Marketing 

Average prices Transportation 

Yields from investment Letter writing 

Import duties Bad accounts 


Letters of application 


Inasmuch as these men were minor executives in the various 
sub-branches of the company at the time, these tests presumably 
measure ability akin to trade test ability, or they measure the 
proficiency attained in executive work of the type which is done 
by such men in the sub-offices of the company. The final per- 
centage ranking given by the company is referred to in the tables 
as the “‘Company’s test ranking.” 

The twenty-one men were ranked in order of merit in the four 
tests,—Alpha, Stenquist Assembly, Combined Stenquist Picture 
and Clerical. The intercorrelations, by the rank difference 
method, are given in Table XXX. 

It will be seen that the Clerical Test, Toops Over-all Ranking, 
Education, Army Alpha, and Combined Stenquist Picture Test, 
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in descending order, all correlated above .50 with the Company’s 
test ranking. By the use of the multiple ratio correlation tech- 
nique it was determined that the selection of applicants could be 
made on the basis of the Clerical Test, Age, Education, Father’s 
Occupation, which would correlate to the extent of .71 +.07 with 
the Company’s test rating. This examination would require 37 
minutes. 


TABLE XXX 


CORRELATIONS BETWEEN TESTs, CoMPANY O, N=21. Marcu g, 1922 * 
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* The Probable Error of the Correlation Coefficients in Table XX XI may be found from 
the following table: 














P.E., WHEN r= 
N 
(a) + T +2 =e +=.4 5 + ,.6 a7) x .8 +.9 
21 mS 15 14 .13 ar2 mie .09 .08 .05 03 


+ A provisional summation of the ranks of all tests except the company’s test ranking. 


As far as it seems safe to predict from the meagre data on hand, 
the traits, as measured by the tests, in decreasing importance 
required of an applicant for the foreign training class are: 


1. A high order of clerical ability. 

2. A high-school or college education. 

3. That his father be engaged in a highly skilled trade or business, 
or preferably in a clerical or professional occupation. 

4. That the applicant be a young man rather than an older one. 
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The multiple ratio importances (6’s) of the traits in order are as 
follows: 


B 
Clerical Testi tits idesc cuteaneen carne se ree tae cee eer 1.00 
cuca tion Wy sccins ain daa pee Medes hoe Oren aonieeetetits FE 
Hather s\Occupationcr mene neice nae tea iene eet .52 
TaN oe OR RESO IoC ah RANE ea OO OOO OD.UD Hoot ct — .60 


When these 8 weights are divided by the respective standard 
deviations, they become the gross score weights of the tests. 


GrRoUP DIFFERENCES IN TEST C-I 


It was found possible to compare the distributions in total 
score of some 1704 people who have been given the C-1 Clerical 
Test at various times and under various circumstances. The 
results of such a comparison are shown in Table XXXI, where 
the groups are arranged in descending order of average score made 
~ when the tests are weighted by the general series of weights. 

It is necessary to point out a few factors making for lack of 
absolute comparability in all cases. The Cornell University 
students, or Summer School students in a course in mental 
measurements, were for the most part principals and superin- 
tendents in New York State schools. They took the printed form 
of the test, as did also the Ac Business College accountants and 
the Company W group; all other groups took the mimeographed 
form. There may be some slight advantage gained by those 
taking the printed form. The B-C Business College students, 
typists, bookkeepers, and stenographers, the B-M Business Col- 
lege students, and the B-D Business School students, typists, 
stenographers, and bookkeepers took the mimeographed form of 
the test with the original experimental time limits which varied 
from the final time limits in the case of six of the ten tests making 
up the clerical scale. Inasmuch as the original total time was 38 
minutes compared with 37 minutes on the other groups, and 
inasmuch as the tests which varied markedly from the final time 
limits received about the same gross score weights, it seems likely 
that the comparison is a fair one. Company I, Company W, 
Company O, the J. R. High School groups, Cornell University 
students, and the boys and girls of the public school groups 
received the test under the standard time conditions. Of these, 
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the Company I employees, the Company O employees, the J. R. 
High School groups, and the boys and girls of the public school 
groups had the test in the mimeographed form as well as under the 
standard time conditions. 

It is interesting to note the almost entire lack of overlapping 
between the public school boys and girls of ages 12 to 15 inclusive 
with the business college students studying accounting, the Com- 
pany O minor executives, and their rather slight overlapping also 
with the B-C Business College typists, bookkeepers, and stenog- 
raphers. The B-D Business School pupils were soldiers in the 
E and R Schools of the Army in 1920 and are for the most part 
clerical workers who were taking a little army business school 
training in order to better fit themselves for their clerical duties in 
army work. 

There is no doubt but that the different clerical occupations 
are approximately in their correct order of general business ability, 
and that accountants should rank higher than typists, bookkeep- 
ers, or stenographers of a like commercial college, such as B-C 
Business College, and also that they should in general rank higher 
in general business ability than clerical workers employed by 
Company I, and finally that these should be superior to unselected 
groups of boys and girls in the public school. This series then 
serves in a rough way as a series of categorical steps in general 
business ability, the amounts of difference in general business 
ability between groups being unknown, as is likewise the exact 
ranking of these groups unknown, although in general it is known 
that the groups are approximately in their correct positions. The 
fact that the school boys and girls are at one end and the high 
school and university students are at the other end with almost 
total lack of overlapping proves that the clerical test correlates 
very highly with intelligence and academic success. The hier- 
archical order among the business occupations likewise demon- 
strates that the test does distinguish between the different levels 
of business ability. However, there is nothing in this to indicate 
that the C-1 Clerical Test is other than an extremely good 
intelligence test which at the same time correlates well with the 
different levels of general business ability. Inasmuch as most of 
the business groups are people who are learning business, and 
inasmuch as the average ages of these groups indicate that they 
are adults, the obvious conclusion is that the C-1 Clerical Test 
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in the grade school at the ages 12 to 15 inclusive does not predict 
later business capacity any better than would any other good 
general intelligence test made up of a considerable number of 
rather non-verbal elements. It may well be that general intelli- 
gence, which functions so highly in the acquisition of success in 
grade school work, functions equally well in the acquirement of 
those things considered essential in a business college. Likewise, 
a high degree of general intelligence may make possible a short 
learning period for acquiring proficiency in a business occupation; 
it may even be a minimum essential for entrance to some of the 
higher level clerical occupations. Once in such an occupation 
with the prerequisite amount of preliminary training, it may be 
that additional increments of general intelligence function to no 
special advantage in the routine work of such occupations but are 
of use in emergencies, and save the time of the executive when 
verbal directions are to be understood and carried out. This 
does not preclude the possibility of all higher degrees of intelli- 
gence being very desirable as minimum qualifications for entrance 
to still higher level business occupations or to executive capacity 
in such occupations. 


Group DIFFERENCES IN THE STENQUIST ASSEMBLY TEST SCORES 


The results of the Stenquist Assembly Test which was given 
uniformly to a number of groups, the averages and the standard 
deviations of the distributions, are shown in Table XXXII. It 
is interesting to note that we find here a hierarchy of test scores 
corresponding very much to the hierarchy found in the clerical 
tests. The groups of men are arranged roughly in an order of 
general intelligence, although there is one marked exception: Com- 
pany O men are superior to Company E or Company W men in 
general intelligence; the Company O men have had little mechan- 
ical experience, their fathers consisting of a larger percentage of 
clerical people than any of our other groups. The girls of the 
public school are markedly inferior to the boys on the Stenquist 
Assembly Test. The average fifteen-year-old girl on the Sten- 
quist Assembly Test is not the equivalent of even the average 
twelve-year-old boy. The men of each group in which both men 
and women were tested made markedly higher Stenquist Assem- 
bly Test scores than the women of the same group, although in 
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each of these groups the two sexes differ but little on other tests, 
as is usually the case where intelligence or clerical tests are given 
to men and women in the same school courses or same occupa- 
tional work under common working conditions. 


TABLE XXXII 


DISTRIBUTION OF SCORES ON THE STENQUIST MECHANICAL ASSEMBLY TEST, 
BY Groups TESTED 
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APPENDIX I 


INSTRUCTIONS FOR ADMINISTERING, SCORING, 
AND RECORDING THE TESTS 


Tue IJ.E.R. Ariru.-RE. TEst 


Directions for Administering Part I of the Test (Thorndike-Mc- 
Call Reading Test): The directions for administering and scoring 
the Reading section of this test are those prescribed in the 
standard instructions issued with the test blanks. The T-score is 
used. Time allowed: 30 minutes. 

Directions for Administering Part II of the Test (Arithmetical 
Problems): After the papers have been distributed, face down 
on the pupils’ desks, and the name, grade, and age of the child 
written at the top of the sheet, read the following directions: 

‘““These are some problems in arithmetic. Write the answers 
to the problems on the blank lines at the right-hand side of the 
page. Use your extra blank sheet to figure on.” 

Time allowed: 15 minutes. 

Scoring: Score is number of answers correct. 

Weighting of the Arith.-Re. Test. See Chapter II, page 12. 
The weighted score is the T-score in the Reading Test plus three 
times the number of right answers in the Arithmetic Test. 


I.E.R. GENERAL CLERICAL TEST, C-I 


Directions to Examiners 

1. See that all subjects are provided with two sharpened pencils 
or one long one sharpened at both ends. 

2. Distribute the papers face up on the desk before each subject. 
As the examiner starts distributing the papers he says: 

“Start at once filling out the answers to the questions on the 
front page. First, fill out the blanks at the top, and then 
answer every question beneath. Do not turn over the page 
until I tell you to.”’ 

3. The examiner then explains in detail how to fill out each 
blank. (Examiner gives numbers in case any are to be omit- 
ted.) He passes around the room and examines each sub- 
ject’s paper in turn, helping the subjects to fill out the blanks 
wherever necessary. Have the class instructor aid in this. 

4. The examiner then reads the following General Directions: 

Ii! 


II 
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General Directions 


“This examination is intended to test your ability in some 
of the simple operations required in clerical work. 

“There are a few general directions which will apply 
throughout the test. Special directions are printed at the top 
of each test for you to read before doing the test. Now, look at 
your general directions on the front page while I read them.”’ 


The examiner pronounces the numbers 1, 2, etc., as he comes to 


Always wait for the signal before turning the page toa 
new test. Whenever I tell you to do so, turn over 
quickly to the next page. My signal will be like 
this: ‘Turn over to Test 4. Begin!’ That means 
that you are to stop Test 3 at once, turn over to 
Test 4, and begin immediately.” 

You are expected to do all you possibly can on every 
test. Most of the time you will not be able to 
finish a test before the signal comes to turn over to 
the next test. Do not be discouraged if you are not 
able to finish any of the tests in the time allowed. 
Few people are ever able to finish the tests.” 

Stop promptly at the ‘turn over’ signal, and begin the 
next test at once. There is no waiting between the 
tests. Evenif you should get tired, do not let that 
keep down your speed.”’ 

Follow exactly the printed directions. After you 
have once begun the first test, no spoken directions 
will be given. These tests all require you to un- 
derstand printed directions, as well as to do certain 
things.” 


. Ask no questions. I will not answer any, and it 


will disturb others for you to ask them.” 


. Both speed and accuracy count. Workas rapidly as 


you can without making mistakes. Neatness does 
not count if you write so that we can read it.” 

Cross out your answer and correct it if you make a 
mistake. Don’t try to erase, as that would take 
too much time. You can usually answer two more 
questions while you would be erasing one mistake. 
Try to be correct in your first answer.” 


. Always do some practice work on the practice pages.” 
. Do not skip around. Usually the easier questions 


come first, and count just as much as the harder 
ones further along in the test.’’ 


. Do your scribbling and figuring on the margins of the 


pages.” 


15. 
16. 
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Check the correctness of your work if you get through 
before the ‘turn over’ signal.” 

You may refer to the directions any time you care to. 
However, you will waste time in doing so. Try to 
keep in mind just what you are to do; master the 
directions, and begin as quickly as possible.”’ 
Guess, when you are not sure of an answer. Don’t 
try to answer any complete test by guessing, and 
guess only when you are not sure of the answer to 
any question. You may raise your score more by 
guessing than by leaving the answer blank. A 
blank is just as bad as a wrong answer.” 

If your pencil breaks, raise your hand and call 
‘Pencil.’ Otherwise you will not speak.” 

Now go into this test as you would into a foot race 
or into a basketball game, and you will make your 
best score. 

“Ready: Turn over to Test 1. Begin!” 


Continue the schedule as directed in the ‘‘Time Administra- 
tion Sheet.’’ An ordinary watch set to 12 hours, 0 minutes, 
and o seconds is satisfactory for keeping time if a stop watch is 
not available. 


The examiner must not give any further directions at any point. 
The directions are a part of the working time. To give more 
directions for any test than those allowed and provided for in the 
“Time Administration Sheet’’ means depriving subjects of time 
needed on the test. The examiner should take about 24 seconds 
to give the direction, ‘‘Turn over to Test 4. Begin!’’ 


TimE ADMINISTRATION SHEET OF THE I.E.R. CLERICAL TEST C-1 
At the end of: 
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books.” 
21. Collect the papers quickly. 
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DIRECTIONS FOR SCORING THE 10 Unit TEstTs oF IJ.E.R. C-1 


Preliminary Directions 


1. a. Check all papers to see that the name is on each of them. 

b. Check to see if any essential information of the question- 
naire on the first page is lacking. If so, supply it, or else 
discard the paper. 

c. Sort and classify papers, if groups or classes are together. 
Tie up in separate bundles with class labels written on a 
paper firmly attached to each bundle. 

2. General Procedure. 

a. On each test determine the last question attempted. This 
means the last full question completed. 

b. Draw a line under the question number of the last question 
attempted. 

c. Record the number attempted after “‘A”’ in the scoring box, 

‘at the lower right-hand corner of the page. 

N.B.The final question “attempted” must have had some 
reaction made to it as indicated by a mark of some sort. 
If only a part of a question is attempted, do not call it an 
attempt. Example: Subject on Test fills out first two 
digits of number 20 row. The number of attempts is thus 
19, and a line is drawn under the 19-question number, and 
“19” is entered after “‘A”’ of the scoring box. 

The attempts are always the maximum number of 
“Rights’’ which the subject could have received on his 
work. Do not confuse this with the maximum score on 
the-test given by the work-limit method. 

d. Refer to the special scoring directions below before proceed- 
ing with the scoring of any particular test. It has been 
found economical where several people are working to- 
gether to score all copies of a single test at one time before 
proceeding to a second test. This would mean that it is 
best, if you have ten booklets, for example, for one person 
to score all ten for Test 1, another person all ten for Test 2, 
and so on. 

e. Make a short horizontal dash through all errors and omis- 
sions and also through the staggered question numbers at 
the right, to show clearly the exact location of all errors (W). 
Use blue or red pencils to mark the errors. Make a short, 
horizontal dash, not a long, sweeping line. 


Do not mark the correct answers in any way. 

Omissions count just the same as errors, save in Tests I, 9, II, 
20, and 28, where whole columns may be omitted without count- 
ing either as attempts or errors. Miscellaneous skipping around 
will be severely discounted by counting all omissions as errors. 

In all choice tests (true-false, etc.) two or more underlined 
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answers to a question, indicating uncertainty or choice, count 
automatically as an error. 

In counting errors the general rule is to count as wrong any 
ambiguous marking or any case where there is reasonable doubt as 
to the answer intended. By adhering to this rule, all papers are 
marked uniformly severely. 

In case two answers are given, the score is an error unless an at- 
tempt has obviously been made to erase or cross out one of them. 


Special Scoring Directions 


Test 1.1 Score of attempts equals the serial number of the last 
addition checked, readily determined by the scoring stencil which 
has the attempts numbered. Maximum Rights=120. 

Count as A the number of attempts shown by the stencil; 
count as W all arithmetical errors and omissions. The score (R) 
is A—W. 

N.B. The above directions refer to the printed form. In the 
old form given a gross score weight of 3 in this investigation, the 
wrongs only were marked; hence the gross score weight in the new 
printed Test 1 is only 14. 

Test 9. Score of attempts is the serial number of the last 
price coded. Maximum Rights=60. Any code consistently 
used throughout the test may be allowed. An entirely new or 
original code used may be allowed if consistently and correctly 
used. One or more errors in a coded price makes that price an 
error. 

Test 11. Score of attempts is the serial number of the last line 
filled out. Maximum Rights=50. Numbers must be correct in 
every digit. No partial credits allowed. 

Test 13. Score of attempts is the serial number of the last 
blank filled out plus any omissions which have occurred. Maxi- 
mum Rights=16. Writing the name of the fruit instead of giving 
its number is allowed. If both numbers and names are given, 
score according to the numbers. 

Test 16. Score of attempts is the serial number of the last line 
marked. Maximum Rights=30. General directions apply. 
Writing the correct word is allowed. 

Test 20. Score of attempts is the serial number of the last 
blank filled out. Maximum Rights=105. 

Test 24. Score of attempts is the serial number of the last one 
filled out. Maximum Rights=20. Spelling does not count if 
the word is not ambiguous. 

Test 25. Score of attempts is the serial number of the last 
answer underlined. Maximum Rights=30. Two underlinings 
to one question count as an error. 


1Test numbers are the Unit Test numbers always printed just to the left of 
the scoring box. 
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Test 26. Score of attempts is the number of 10’s which might 
have been encircled up to and including the last one marked. 
Maximum Rights=50. There are from 2 to 5 attempts in each 
row. 

The figure at the right-hand margin of each row keys the num-, 
ber of circles which should be in that line. Check by inspection 
to see that all circles are around a ‘‘10”’ group and that there is 
the proper number of circles in each row. In case of an actual 
marked error draw a line through the error and put one dash 
beside the number at the right-hand edge. Make a similar dash 
at the right-hand edge for an omission. 

The cumulative numbers in parentheses at the left are for 
convenience in finding attempts. For instance, if the first two 
10’s of line 35 are encircled, the attempts are 37. The wrongs are 
the number of marks placed at the right-hand edge. No stencil 
is needed. 

Test 28. Score of attempts is the serial number of the last pair 
of parentheses filled. Maximum Rights=4o. If done in lines 
across the page the lower groups are not counted as omitted. 


PER iesraG—2 


Directions for Administering 


Distribute a pencil, test blank, and directory to each pupil. 
Have pupils fill in name, address, date, age and school grade in 
the proper spaces on the front of the test blank; then read the 
following directions: 

‘When I say ‘Begin,’ open your books and begin on Test I. 
When you finish Test I, go right on to Test 2 without stopping, 
and then do Test 3 without stopping, and so on. Do not wait, 
but go right on from one test to the next. At certain times I shall 
tell you to begin on a new test even if you have not already begun 
it. Keep at workallthetime. The words at the top of each test 
tell you what you are todo. You are to go ahead as fast as you 
can, working accurately. 

“Ready! Turn over to Test 1. Begin!”’ 


CUMULATIVE TIME ADMINISTRATION SHEET 


At the end of o min, say, ‘Ready! Turn over to Test 1. Begin!” (2) 
eee" 2 “Even if you haven't finished Test 1, begin now 
on Test 2.—Test 2.” (2) 

eeu «"4  ‘“  “ “Even if you haven’t finished Test 2, begin now on 
Test 3.—Test 3.” (3) 

eeeen"7 “  “ “Even if you haven't finished Test 3, begin now on 
Test 4.—Test 4.” (3) 

come "yo “Even if you haven’t finished Test 4, begin now on 
Test 6. There is no Test 5.” (6) 


a STON oe Stop in Gloseyioun booksis 
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The Scoring of I.E.R. Test C-2 


On Test 1 (underscoring of letters), count as attempts the num- 
ber of letters that should have been underlined up to and including 
the last letter underlined. Give as the score the number of at- 
tempts minus errors (actual errors and omissions) up to that 
point. 

On Test 2 (underlining one digit), if subject has worked in 
columns, count as attempts the number of digits that should have 
been underlined in the columns up to and including the last digit 
underlined. Give as the score the number of attempts minus er- 
rors (actual errors and omissions) up to this point. If subject has 
worked in rows, the same method is followed except to count as 
attempts the digits that should have been underlined, considered 
by rows instead of by columns. 

On Test 3 (underlining groups of digits), if subject has worked in 
columns, count as attempts the number of groups of digits that 
should have been underlined in the columns up to and including 
the last group underlined. Give as the score the number of at- 
tempts minus errors (actual errors and omissions) up to this 
point. If subject has worked in rows, the same method is fol- 
lowed except to count as attempts the groups which should have 
been underlined, considered by rows instead of by columns. 

On Test 4 (same and different numbers), if subject has worked 
in columns, count as attempts the number of pairs that should 
have been underlined in the columns up to and including the last 
pair underlined. Give as the score the number of attempts 
minus errors (actual errors and omissions) up to this point. If 
subject has worked in rows, the same method is followed except to 
count as attempts the pairs that should have been underlined, 
considered by rows instead of by columns. 

On Test 6 (copying addresses) count as attempts the last ad- 
dress copied and give as the score the number of attempts minus 
errors (actual errors and omissions) up to that point. (The two 
addresses given at the top as samples are not included in either at- 
tempts or rights.) If home addresses are given consistently 
throughout instead of New York City addresses, as required, give 
as final score 90 per cent of the number of home addresses cor- 
rectly given, most easily determined by deducting Io per cent, 
recording result to nearest whole number. If both New York 
City addresses and home addresses are given, compute the score 
on the basis of the New York City addresses only.! 

Weighting the Tests: (Add the scores on Tests 1,2 and 3. Mul- 
tiply the gross scores on Test 4 by 4. Multiply the gross scores 
on Test 6 by 10. Add the three quantities together for the sub- 


1 This is in accord with our general scoring principle that if the subject does more 
than is required, and so penalizes himself by using up the time allotted for the test, 
he is not penalized for including the additional parts of his answer. 
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ject’s weighted score on Test C-2. For the derivation of weights 
used, see Chapter V, pp. 96-97.) 


STENOUIST MECHANICAL TEST 


Directions for Administering 


The Series I of the Stenquist Assembly Test of Mechanical 
Ability was used in this investigation. This series consists of the 
following ten articles: Cupboard catch, clothes pin, Hunt paper 
clip, chain, bicycle bell, shut-off, wire stopper, push button, lock 
No.1, and mouse trap. This series is for sale by the C. H. Stoelt- 
ing Co., Chicago, Illinois. The procedure followed in administer- 
ing the test was that recommended by Dr. Stenquist in a manu- 
script entitled ‘‘Stenquist Assembly Test of General Mechanical 
Ability, Description and Manual of Directions,’’ issued from the 
Board of Education of the City of New York. The scoring sheet 
therein recommended was slightly altered in order to make it 
more useful in quickly recording the scores secured by the subject. 

The revised directions for the use of these tests are now avail- 
able in a booklet entitled “‘Stenquist Assembling Test of General 
Mechanical Ability, Description and Manual of Directions,” 
published by the C. H. Stoelting Co., Chicago, Illinois. The 
administrative directions used in this investigation are as follows: 


1. Distribute one Scoring Sheet and one box, with hinges toward 
the pupil, to each pupil. Caution the pupils as follows: 

“Do not open the boxes until I tell you to do so.” 

2. ‘Now write your name at the top of the page where it says 
‘name,’ and then fill in the other blanks.’”’ (Examiner walks 
around room explaining how to do it wherever necessary.) 
“Now, fold your papers once lengthwise (illustrating) like this, 
and put the paper under your box.” 

3. “Look at the directions on the lid of the box while I read 
them. In this box there are some common mechanical things 
that have all been taken apart. You are to take the parts and 
put them together as they ought to be; that is, you are to take 
the parts and put them together so that each thing will work 
perfectly. 

“Do not copy what your neighbor is doing, but work ab- 
solutely by yourself. Keep the box turned so that the hinges 
are toward you. When opened in this position the cover forms 
a tray in which to work. 

“Do not break the parts. Everything goes together easily 
if you do it in the right way. Begin with Model A; then take 
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Model B; then C; and so on. If you come to one that you 
cannot do in about three minutes, go on to the next. The 
person who gets the most things right gets the highest score. 
“Ready: Begin!’’ (Examiner sets watch to 12 hours, 0 
minutes, 0 seconds.) 
4. After 3 minutes, say: 
“Do not spend more than about 3 minutes on any one 
model.” 
5. At the end of 30 minutes, say: 
“Stop! Put your paper with your name on it into the box 
and close the box.”’ 


Scoring the Stenquist Assembly Test 


At the outset the Stenquist Assembly Test was scored on two 
bases: the first was the partial scoring method, using a mimeo- 
graphed scoring form and scores recommended by Dr. Stenquist; 
the second was on the all-or-none basis, credit being given only for 
a perfect performance of a given model. It soon became quite 
evident that for public school boys and girls of the ages 13-15 
inclusive, the all-or-none basis would result in a great number of 
zero scores which would undoubtedly decrease the validity of the 
test and possibly spuriously affect its reliability. Accordingly, 
the all-or-none basis was discarded. All correlations of the 
Stenquist Assembly Test herein reported are for the raw scores 
determined from the partial scoring method using the mimeo- 
graphed form of the scoring blank. This mimeographed form has 
been slightly altered for ease in scoring and has been printed by 
the Institute of Educational Research under the title, ‘‘Stenquist 
Mechanical Assembly Scoring Sheet.’’ 


Suggestions for Scoring the Stenquist Assembly Test While 
Administering Paper Tests 


Our procedure in the administration of these tests may be 
interesting to anyone who wishes to give several scales for voca- 
tional guidance purposes to the same subjects. It has been found 
quite possible for two examiners, using the printed scoring sheet, 
to score approximately 30 boxes in 30 minutes. Then in the 
interval of about 10 minutes required for dismissal of the first 
test group and entrance of the second, the two examiners may dis- 
assemble the scored models ready for the second group. This 
procedure would enable a continuous succession of tests of 30 
subjects each to be made every 40 minutes throughout the day. 
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However, this would be rather strenuous work for the two ex- 
aminers, and it has been found possible to combine other tests 
with the Assembly Test to good advantage. The procedure fol- 
lowed is: the subjects during the first 30 minutes take the Sten- 
quist Assembly Test, followed the next 30 minutes by the Thorn- 
dike-McCall Reading Test, which is a test involving no additional 
attention from the examiner once the directions are given and the 
subjects have begun on the test. While the subjects are working 
on the Thorndike-McCall Reading Test, the examiners are busy 
scoring the Stenquist Assembly Test. The Thorndike-McCall is 
then followed by other tests to be given in the general series. 
There is an additional advantage to be secured by giving all tests 
at one sitting, that of insuring that all subjects will have test 
scores complete on all the different tests, a feat which is quite 
impossible if the different tests are given on different days. The 
only restriction here seems to be that one should not have so many 
tests on one day that fatigue will enter. With pupils of ages 13, 
14 and 15, two hours testing does not seem excessive provided 
breathing exercises and a short rest interval are given at the end 
of the first hour. This was done in all cases where our tests were 
given for more than one hour at a time. 


Stenquist Mechanical A ptitude Tests 


These tests are paper-and-pencil tests devised to be given to 
subjects in groups and to be used either in the absence of the 
Stenquist Assembly Test or to supplement the Assembly Test. 
They are sold by the World Book Co., Yonkers, New York. 

Both Picture Tests I and II were given to all boys in Public 
School B who were 13 years of age or over. Inasmuch as the 
standard directions had not yet appeared at the time the tests 
were given, the directions used in administering the tests varied 
slightly from those recommended in the test manual now pub- 
lished by the World Book Co. The directions were slowly and 
distinctly read to the subjects allowing plenty of time for them to 
note the picture parts referred to in thedirections. Theindividual 
coaching of subjects who failed to grasp the idea of the test, recom- 
mended by Dr. Stenquist, was not followed. Such a subject was 
merely urged to “figure it out’’ for himself. 

A time limit of 30 minutes on each form was strictly adhered to. 
Few subjects of the ages 13 to 15 are unable to finish in this time, 
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while the majority finish rather earlier. A shorter time limit 
would undoubtedly secure as good results. Considerable dif- 
ficulty was experienced in Test I by reason of the reversal of the 
pictures for the later exercises. The subjects are extremely prone 
to turn the paper around and begin on exercise 6 instead of exer- 
cise I, even though specially cautioned, ‘‘ Begin on exercise 1, and 
not on exercise 6.” 


The Weighting of Stenquist Picture I and II 


It seemed desirable to combine the scores on Forms I and II 
of the Stenquist Mechanical Picture Test in order to save time in 
the computations. The distribution of test scores on each of the 
two tests was tabulated for the 467 pupils in Public School B who 
had taken both of these tests, and yielded the standard deviations, 
o7=13.50 and o77=10.83. If, then, Stenquist I raw-scores be 
weighted 4 and Stenquist II raw-scores be weighted 5, the two 
tests will be given almost identical partial regression true im- 
portances. The raw-scores, rather than the T-scores, were used. 


THE Grirvs’ [.E.R. AssEmMBLY TEST 


Directions for Administering the Girls’ I.E.R. Assembly Test 


The test may be given to groups of any size, the size being 
limited only by the number of sets of tests available. Each pupil 
preferably should have a desk top or three feet of horizontal space 
on a table, upon which towork. Separate desks are preferable for 
the reason that the pupil is less tempted to watch his neighbor at 
work. 


1. Distribute one test box and one scoring sheet to each pupil. 
Have pupils fill out the information blanks at the top of the 
scoring sheet, and then fold the scoring sheet lengthwise through 
the middle and place beneath the box out of the way. 

2. Instruct the subjects as follows: ‘‘When you open your 
boxes (wait for the signal), you will find a pair of scissors, a little 
box and some envelopes. First, open the'little box, which has a 
big letter A on it, and string the beads just like the sample. Then 
open the envelope marked B and put the parts together so that 
they will look just like the sample. Then take the parts in en- 
velope C and put them together so that they will look just ltke the 
sample. ThendoD,E,andF andsoon. Do not spend too long 
on any one package but work as fast as you can and do your work 
neatly. Ready! Open your boxes; find your scissors; and begin 
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on box A. String the beads so that they will look just ltke the 
sample.” 

At this point set your watch at 12 hours, 0 minutes, 0 seconds. 
Allow exactly 45 minutes, and give the signal: “Stop! Put all 
your parts, together with the scraps, back into the big box and 
close your box quickly.” 


Scoring the I.E.R. Girls Assembly Test 


Score the test according to the standard scoring sheet which 
may be secured from the Institute of Educational Research, 
Teachers College. In pencil draw a straight line from the credit 
given out to the margin and write in the margin the credit al- 
lowed. These are later added up and written at the head of the 
sheet on the blank line after the word “‘Score.’”’ The following 
comments, in regard to credits that may not be clear from the 
reading of the scoring sheet, will be helpful. 


A. Stringing Beads. If the subject’s model is all O.K., give 
him a credit of 10, and disregard the partial scoring. If not O.K., 
give partial credits according to the scoring sheet. (The partial 
credits, which add up to 10, all align in a given column.) Both 
loops must be made to get credit for loops. No credit for one loop 
only. A bow knot receives zero on the partial credit “‘ properly 
tied on card.” 

B. Inserting Tape. All O.K., give 10. The credits of 8 and 2 
are over-all performances and therefore occur on the printed page 
in alignment with the 10 and not with the partial credits. Thus 
8, 2, and o are the only credits allowable. Two holes or more 
incorrect receives 0. 

C. Rosette: All O.K., 10. Bow knot, if otherwise Ghee 
receives a credit of 8. 

D. Cross Stitch. All O.K.; to. ‘‘Corner’’ means whether 
corner has been turned exactly like sample. If the corner is not 
turned correctly, this means that the subject began the sample at 
the wrong corner. Consequently this is a serious error. Neat- 
ness may be given a subjective score of either 0, 1, 2, or 3, accord- 
ing to the scoring sheet schedule, which shows what credits are to 
be deducted for puckering, for stitches not in the corner of the 
squares, and for the thread being too loose. 

E. Key Ring. All O.K., 10. This is one of the most readily 
scorable of all the tests. Follow the scoring sheet. 

F. Clip Chain. All O.K., 10. ‘Joined singly’’ means only 
one wire hooked in each case, but all in a straight line. If the 
double wires are hooked, give the credit, if any, which would be 
appropriate for the number of links correctly made. Deduct no 
credit for the chain not being hooked on the card. 
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G. Tape Sewing. All O.K., 10. Pull the tape gently at all 
points; if the thread pulls out in even a single stitch leaving the 
tape loose, the first four points credit are lost. The remaining 
judgments are all quality judgments and are scored as per the 
scoring sheet. 

H. Trunk Tag. The credit of 5 is given for the difficult feat of 
getting the lower strap through the buckles (that is, the hole 
which is in the middle of the strap being correctly placed on the 
tongue). When held in the left hand with card up and buckle at 
tips of the fingers, the card must be in a position to read. 

I. Card Wrapping. All O.K., 10. If any cross has 4 of the 
cross, or more, inside or touching the pencilled lines count as 
““inside.’’ Give one point each for each of the crosses correctly 
made within the lines. The crosses must be made with the proper 
kind of lapping. 

J. Booklet. All O.K., 10. All the scoring points in this test 
are subjective points, but the test is readily scorable. 

K. Trimming Paper. If cut between the lines, give one point 
each for each of the intervals between two adjoining pairs of num- 
bers that are passed without touching either of the boundary lines. 
One may get credit on the first interval, miss any number of suc- 
ceeding ones, and finally get other credits on more difficult 
intervals. 

If the subject attempts to trace one of the two printed lines, 
then use the qualitative judgments provided on the scoring sheet. 

In the entire scale, give no credit for “‘ partially correct’ scoring 
items, except in the neatness judgments which are indicated on the 
scoring key in serial fashion, ‘‘o, I or 2.”’ Each item, except the 
neatness judgments, is to be scored on the all-or-none basis. 


Directions for Assembling the I.E.R. Girls’ Assembly Test 


The standard specifications for the I.E.R. Girls’ Assembly Test 
are as follows: 
Needleworkers’ scissors, 44 inch, one blade pointed, the other 


slightly blunt. 


A. Stringing Beads. Model consists of 24 beads of red, blue, 
and yellow, alternating in that order by fours, which are to be 
strung on a pink cord 193 inches long. The cord is lapped once 
back through the two end beads and is then run through holes in 
the card and tied in a hard knot behind. The cardboard is made 
of heavy binder’s board 24 by 32 inches with §-inch holes centered 
Zinchfromtheends. The box to contain the model and materials 
is a sliding pasteboard box, 23 by 3? by 1 inches, inside measure- 
ments. To assemble, a pink cord 19% inches long is threaded into 
a No. 1 darning needle and is wound once around a blank punched 
card; 24 beads are counted and checked to see that they contain 8 
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of a color and are placed in the bottom of the box. On top with 
cardboard down is placed the model, coiling the strung beads in a 
rope; then above the model is placed the punched card with 
threaded needle, the needle being underneath and the box is then 
closed with the letter A on top. After scoring, clip the cord on 
the subject’s product, disassemble, recount and recheck the beads, 
rethread the needle and reassemble. The cord is used up and 
must be supplied from the storeroom. 

B. Inserting Tape. The model is constructed of one piece of 
white beading, standard pattern, 15 inches long, in which is 
inserted one piece of white tape 15 inches long beveled to 45° at 
one end, the tape being held in place by a metal eyelet in each end. 
To assemble, place one model, one piece of beading and one piece 
of tape, separated, in envelope B, folding all together two or three 
times in order to prevent undue rumpling. After scoring, disas- 
semble and place back in envelope. 

C. Rosette. Model is constructed of one stiff cardboard 13 inch 
square punched with 8 holes symmetrically arranged around the 
center in a circle 1 inch in diameter, threaded in rosette design 
with one pink cord 15 inches long, tied in hard knot. To assem- 
ble, place model, one unused cardboard and one pink cord 15 
inches long, in envelope C. After scoring, cut the cord and use 
cardboard over again. Cord must be replaced. 

D. Cross Stitch. Model is made up of one square of lavender 
checked gingham, cut with a white border, four dark lavender rows 
of squares each way, sewed with standard cross stitch design. To 
assemble, thread one No. 5 needle with a 15-inch No. 16 black 
sewing thread and knot one strand of the thread; wind threaded 
needle about the small cardboard to keep it from tangling; place 
threaded needle, model, and blank piece of gingham in envelope 
D. After scoring, destroy the subject’s performance, rethread 
needle, replace gingham and reassemble. Both the gingham and 
the thread are used up and must be replaced. 

E. Key Ring. Model consists of one key ring and key properly 
assembled on a heavy piece of cardboard 6} inches by 34. To 
assemble, place model and the four separate parts in envelope E. 
After scoring, disassemble and place again in envelope. 

F. Clip Chain. Model consists of six No. 1 Gem wire paper 
clips, assembled in standard fashion and fastened to dress hook 
supports attached to heavy binder’s board card 6} inches by 33. 
To assemble, place model, one blank card, and six separated clips 
in envelope F. After scoring, disassemble and place again in 
envelope. 

G. Tape Sewing. Model consists of one 64-inch by 34-inch 
piece of white muslin, bound with white tape 4 inch wide of the 
same length, sewed on with No. 16 black sewing thread. To 
assemble, place one model, one piece of muslin, one piece of tape, 
one threaded and knotted No. 5 needle, wrapped on card as in D 
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above, inenvelope G. After scoring, carefully trim off the taped 
edge and use the muslin in two or three further administrations of 
the test; rethread the needle, and reassemble into envelope. The 
tape and thread are used up each time, and the muslin also after 
about four administrations of the test. 

H. Trunk Tag. Model consists of one trunk tag assembled 
according to standard specifications, and fastened to heavy 
binder’s board 6} inches by 3% inches. To assemble, place one 
model and the five separate parts in envelope H. After scoring, 
disassemble and replace in envelope. 

I. Card Wrapping. Model consists of one stiff binder’s board 
6% by 33 inches with 8 semi-circular holes along each side sym- 
metrically placed according to a standard templet, wound in 
standard design with two pink cords each 34 inches long and 
knotted in a bowknot at eachend. The cords supplied to the sub- 
ject are each 34 inches long, and knotted together with a hard 
knot at 3% inches from one end. To assemble, place one model, 
one blank card, and one pair of cords in envelope I, coiling the 
cords by winding them around the hand in order to insert them 
easier in the envelope. After scoring, either untie the subject’s 
performance, or clip the cord and replace it in case the knot is too 
firmly tied. The cord is ordinarily used up. 

J. Booklet. Model consists of pasted booklet constructed from 
two manila cards 2? by 3} inches, hinges made from a one-inch 
square mucilaged blue paper, cut in three equal parts. To as- 
semble, place in envelope J one model, one manila card 6; inches 
by 3%, and one piece of blue mucilaged paper one inch square. 
After scoring, destroy subject’s performance, and replace manila 
card and mucilaged square. The working material is all used up 
by the subject. 

K. Trimming Paper. Model consists of one cut-out inside part 
of standard cut-out design. To assemble, place one model and 
one blank cut-out sheet in envelope. After scoring, destroy sub- 
ject’s performance and replace. The blank cut-out is used up. 


From the above it will be seen that for each test subject, in 
case he attempts every model, there will need to be supplied anew 
the following materials which are used up with each administra- 
tion: 


pink cord 194 inches long, used in the bead stringing model. 
pink cord 15 inches long, used in the rosette model. 

piece of gingham, used in the cross stitch design. 

No. 5 threaded needles, threaded with 15 inches of No. 16 black 
sewing thread, used in the cross stitch and tape-sewing models. 
piece of muslin 33 by 6} inches, used in the tape sewing model. 
tape 4 by 6% inches, used in tape sewing model. 

pair of 34-inch pink cords, used in card wrapping model. 


NOR HH 


= = 
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I manila card 64 by 33 inches, used in booklet. 
I inch square of blue mucilaged paper, used in booklet. 
I printed cut-out design, used in trimming paper model. 


Since these must be replaced after each administration of the 
test, it is desirable to assemble large quantities of these supply 
materials in the storeroom in order that testing may not be delayed 
by the necessity of spending time upon preparation of supply 
materials. After all the envelopes have been assembled, arrange 
them in order, B to K inclusive, and place one of the boxes, No. A, 
and one each of the envelopes and a pair of scissors in each of the 
large work boxes. This completes a test outfit for one test subject. 


THURSTONE MANUAL TRAINING TEST 


In order to save time in its administration, and since the boys’ 
course in wood working had not covered instructions in me- 
chanical drawing nor in wood finishing nor the finer points in 
cabinet making, twenty of the questions of the Thurstone Manual 
Training Test were eliminated before mimeographing the test. 
' The questions eliminated from the original Thurstone Test are 
numbers I, 2, 5, 6, 7, 9, 12, 21, 23, number 29 (for which was sub- 
stituted the statement ‘‘A No. 8 wood bit makes a hole 4 inch in 
diameter’’), 45, 46, 48, 50, 51, 54, 62, 64, 66, 73, 81, 88, 93. The 
test was then mimeographed and administered with a time limit 
of 20 minutes (which was more than ample) and with the fol- 
lowing revised directions: 


“Some of the statements below are true and some are false. 
Read each statement. If the statement is true, draw a line under 
‘TRUE’; but if the statement is false then draw a line under 
‘FALSE.’ If you are not sure, guess. It is better to guess than 
to leave out a statement.” 


( A knife is used to cut steel. True. False. 
Samples: { The hammer is used to drive nails. True. False. 
| Screws should be driven withahammer. True. False. 











EVALUATION OF THE OCCUPATIONS OF FATHERS OF TEST 
SUBJECTS 


The occupations of the fathers of the test subjects were given 
credits according to the following scale: 
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OCCUPATION GROUP CREDIT 
Miarckited omlaboren-a 0 we oot ee pc bas I 
(Weitg (S alls Dy red I Poe AOR ne Bn ene ee 2 
ile tha csmegs rate ee ae AB en Sie eS ce gewes a 
PLUS SGMOLCIOLIC oc tee i ne oe eine se eente 4 
ELKO CSIOLI A Or eet Re eet ek As ey ube Ene 5 


These values are used in determining for different groups the 
average occupation of father. It is not claimed for this scale that 
the intervals between steps are equal; but, considering them so in 
the statistical evaluation of occupation, one secures certain 
significant statistical results; the results would be different if a 
different scale were used. If, however, one may secure results of 
value with this rough scale, he can secure still better results with a 
better scale, more objectively worked out and with the scale 
intervals more nearly equal. The scale is readily applied, and 
has a high scoring reliability. 


THE ROLE OF PROBABILITY IN MAKING INDIVIDUAL 
RECOMMENDATIONS 


The recommendations to a pupil who has taken the tests should 
be couched in terms of probability determined from a correlation 
plot or ‘“‘scattergram”’ between the criterion and the composite 
test score. Kitson! gives a concrete illustration of the necessity of 
this by analogy from the insurance business, quoted below: 


‘“A man of thirty years inquires of an insurance company if he 
will live to the age of seventy. Actuarians have studied thou- 
sands of cases and have discovered that out of every thousand 
men who are sound at thirty, a fairly constant number, say one 
hundred, become septuagenarians. The company physician 
tests this man and finds him sound. But it does not tell him: 
‘Yes, you will live to the age of seventy.’ For although one 
hundred in every thousand thirty-year-old sound men achieve the 
septuagenary, this man may be one of the nine hundred who die 
at an earlier age. Accordingly the physician states the man’s 
longevity in terms of probability saying: ‘You have one chance 
in ten of living to the age of seventy.’ And to show the strength 
of its conviction, the company is willing to wager a specified sum 
with the applicant.”’ 


1 Kitson, H. D. ‘Vocational Guidance and the Theory of Probability,’’ School 
Review, Vol. 23, No. 2, 1920, pp. 143-50. 
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It should be obvious that, the higher the correlation of the com- 
posite tests with the criterion, the smaller the test score limits 
within which we shall be able to place an individual’s success. 
By a manipulation of the standard error of estimate we can arrive 
at the probability recommendations which are more or less 
independent of the frequencies which occur in any array of the 
table. The standard error of estimate is a function of the correla- 
tion coefficient rather than of the separate arrays, and so is less 
likely to vary than the arrays of the ‘‘scattergram”’ which shows 
the relationship of criterion scores to composite test scores. 


MATERIALS NEEDED FOR VOCATIONAL GUIDANCE TEstTs! 


Stenquist Boxes 
Stenquist Hospital box (supplies) 

‘ Stenquist Pliers 
Stenquist Scoring Sheets 
I.E.R. Girls’ Assembly Boxes 
I.E.R. Girls’ Assembly Scoring Sheets 
I.E.R. Clerical C—1 Blanks 
I.E.R. C—2 Blanks 
I.E.R. C—2 Directories 
Thorndike Arithmetic Form C Blanks 
Thorndike-McCall Reading Form 8 Blanks 
Reliable watch with second hand (or stop-watch) 
Pencils 
Administrative Directions 


1 The test materials of this list may be obtained from the Bureau of Publications, 
Teachers College, New York City. 


APPENDIX II 
THE CONSTRUCTION OF CRITERIA 


THE STENOGRAPHIC CRITERION 


In order to obtain a very reliable estimate of the pupils’ 
abilities to progress in the acquirement of stenography, the follow- 
ing sources of independent information about each pupil were 
obtained: 


Ie 


JIN 


Ill. 


IV. 


Lines of shorthand taken down in 8 min. 15 sec. by the pupil on a formal 
performance test, Army Test 762-A. This test required the pupil to copy 
in shorthand notes an extract from a mimeographed copy, the notes being 
immediately transcribed on the typewriter. The competitive element 
was present, as ‘‘the first to finish”’ was allowed to set the time of the test; 
the fastest pupil finished the test in 8 min. 15 sec., after which the re- 
mainder of the class completed their notes after marking the place at 
which they had arrived when the fastest pupil had finished and had given 
the ‘‘stop” signal. This test was given at an average length of practice 
of about 120 days; the variation in amount of practice was due to absence 
or to a week late entrance to school; the latter factor being of small 
importance as the early students ‘‘marked time”’ until the later arrivals 
caughtup. The theory of any such measurement is that relative standing 
at equal amounts of practice is highly correlated with rate of ability to 
learn shorthand, and perhaps well correlated with final proficiency. 


Average, M,=36.72; standard deviation, 7; =6.54. 


Words attempted in transcription of the above test in 14 min. 0 sec. 
This corresponds to speed of transcription. This was also competitive, 
the time of 14 min. 0 sec., being the time at which the fastest pupil com- 
pleted all of the performance test. 


My; — 228.90; OT = 68.79. 
Number of errors in transcription of the above test. This corresponds to 


accuracy of transcription. ‘‘Errors’’ are purely transcriptional errors 
and not typographical errors made on the typewriter. 


My =14-15; O71 = 11.63. 
Average percentage school marks on six periodical reviews in the textbook 
theory of shorthand. This is the measure of the amount of theory 


retained until ‘‘examination time.’’ (It is the aim of the instructors to 
complete the textbook theory in six months.) 


My = 88.35; Ov =5-92. 
129 
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V. Total arithmetical sum of errors made on twenty lessons in shorthand 
textbook theory upon first examination in each respective lesson. These 
school marks were recorded in the teacher’s grade book for each lesson. 
If a ‘‘passing’’ mark is not obtained, the pupil is required to take over the 
examination until a ‘‘passing’’ mark is obtained; the errors of the first of 
such examinations was taken, rather than the second or third when they 
occur, as better expressing the pupil’s actual rate of learning and interest 
instenography. ‘‘Errors”’ are teacher’s count of what constitutes errors, 


My =121.01; oy =55.-48. 


VI. Number of actual days of practice required to pass the second, or more 
difficult, dictation test at a speed of 75 dictated words per minute with a 
given minimum of 11 words per minute transcription and not over 2 or 3 
transcription errors in a total of 375 words. This is an individual 
examination conducted by the teacher. A dictation test, involving a less 
difficult vocabulary, precedes the more difficult test here taken as a 
measure of the pupil’s ability to progress. From the school attendance 
books, the actual days of attendance, excluding absences, was determined. 


My, =113.46; Tyy=10.29. 

VII. Average of six monthly ‘‘over-all’’ school marks, kept in the registration 
office of the college as the permanent record of the student. This is the 
record available for inspection by prospective employers. The letter 
marks were arbitrarily given numerical scores as follows: F— =0; F=1; 


F+=2; G—=3; G=4; G+=5; E—=6; E=8; E+=10. 
My, =5-353 Oy =1.56. 


These seven variables were combined into one score by the 
following’ procedure: 

a. The averages, M,’s, and standard deviations, o’s, of all 
variables were computed. 

b. Two examiners, who had worked through the procedure of 
obtaining the above seven variables and who consequently were 
presumably fair judges of the comparative reliability and im- 
portance of the different variables in predicting ability to progress 
in stenography, were allowed independently to distribute 20 units 
of bids to the seven variables. From the two series, given below, 
a compromise weighting, W, was then decided upon after con- 
sultation and discussion of the factors of reliability or unrelia- 
bility of each variable. The weights given by the two judges to 
the seven variables are: 


I II WUT NSS V VI VII Tora 
Nudge: Senna ack toe B 2 I 3 6 4 I 20 
judges rere. eceee ee I 2 I 5 6 3 2 20 





Compromise....... I 2 I 4 6 4 2 20 
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Criterion Score 











Ww W Wee W 
Xo= Xt EX t+ St Xt Nig eae eae 
OT OT OIIl OlVv Ov 
Wy ~- W. 
Tt Xvit VU XyvutK, (A) 
OVI OVII 


in which Wj, Wyy, etc., are respectively the compromise weights 
above; oj, oy, etc., are respectively the standard deviations of 
WarwilsaVate lb etc: 

K is an arbitrary constant to be taken of such magnitude that 
the criterion scores will be readily handled quantities. It is 
preferable to use the former form and multiply the gross meas- 


ures by —, etc., without finding the deviations and thus being 
OT 


compelled to work with negative deviations. 
The formula used is then: 
Criterion Score 











I 2 I 4 6 
Mea ide Miche ae ie 
6.54. 68.79 11.63 5.92 55.48 
2 
Avie OI: (B) 
10.29 1.56 


The minus signs of the above are to be particularly noted. The 
first of the above two formulae assumes that greater numerical 
scores, X’s, in the different variables always indicate greater merit 
or ability. 

' Variables III and V, errors, are the reverse of this, greater 
numerical scores indicating less merit; accordingly, Wy; and Wy 
become minus in the specific, or second (B) equation. It will 
also be noted that Variable VI, number of days’ practice required 
before being able to pass a difficult 75-word-per-minute-dictation 
test, should receive a negative sign for Wy;, since more days of 
practice to come up to a given proficiency means less merit, a 
slower rate of learning. The distribution of the twenty bids was 
made in the abstract, i.e., disregarding signs and o’s and consider- 
ing the relative importance of the variables as dependent upon: 


1 A regression value of K will also result from the simplification of the form: 


Wvit 
3 (Xvi —Myin)- 





W Wry 
—! (xy-My)+— (Xy1-Mp)+ 
oT oly 
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a. The amount of practice of which it is a measure. 

b. The carefulness and objectivity of the grades or scores. 

c. The amount of missing or incomplete measures. (It was 
found necessary to supply school marks, etc., for a given person if 
only a small amount were missing. This is better practice than 
to discard a pupil on whom data are not complete; in fact, it is the 
only possible procedure if one would have a reasonable number of 
subjects left on which to base his tests. It requires a judicious 
statistician to handle the mass of statistics ordinarily found in 
school records. In place of missing score one may supply either 
(a) the average score of all the subjects, or (b) a score estimated 
by inspection to be the average score to be derived from two or 
three regression equations with other highly related variables.) 

When simplified, the above equation yields the multipliers! of 
the gross scores of the respective seven variables, which products 
we algebraically summate to obtain the one composite criterion 
score. This equation is: 

Criterion Score 
X= .152X1+.029X 1;— .086X yy + .676X py — .108Xy— 
.389X yy +1.282X yz. (C) 


One has now but to take the seven criterion gross scores of 
person A and multiply them in turn by the seven coefficients of 
equation C, and add algebraically to get A’s single criterion score. 
A did: I, 36 lines; II, attempted 229 words; II], made 14 errors in 
transcribing; IV, made average of 94 per cent on six reviews; V, 
had 61 errors in the total twenty lesson examinations; VI, took 
113 days to pass the second dictation test; VII, made an average 
monthly grade of 8 as on the above enumeration. His criterion 
score is accordingly: 


Xe =.152(36) + .029(229) — .086(14) +.676(94) —.108(61) — 
.389(113) +1.282(8) = 34.04. 
This equation C solved for person A. The criterion score, X¢, 
of A is 34.04. The scores, after being similarly computed for all 


1It will be noted that the magnitudes of these new multipliers of the gross scores is 
anything but in proportion to the Wy, Wy , etc., weights of regression importance. 


This is pointed out here since certain test makers in the past have thought that ‘‘by 
multiplying age by 5 gives that variable a weight of 5.’’ In reality they were 
multiplying not by 5 but by a true relative importance of B/o =5, which figure is to 
be compared by its ratio to a similarly derived one for a second variable. The unit 
of relative importance of a variable in forming a criterion is 10. 
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subjects, can be reduced to a basis of small numbers by means of 
a grouping table. 


THE TYPING CRITERION 


The procedure followed here was in general the same as for the 
stenographic criterion. The following five variables were ob- 
tained. 


I 


fe 


1B Ob 


IV. 


Days of actual practice, exclusive of absences, required to complete lessons 
I-10 of the copy textbook. This might be called the first stage of typing 
progress. 

M,=31.55; 0, =15.46. 


Days of actual practice, exclusive of absences, required to complete lessons 
11-20 of the copy textbook. This might be called the second stage of 
typing progress. 

My, =24.81; 0), =10.18. 


Days of actual practice, exclusive of absences, required to complete lessons 
21-30 of the copy textbook. This might be called the third stage of typing 
progress. Inasmuch as but few pupils had advanced farther, it was 
impossible to test the pupils at their final limit of progress. Variables 
I-III measure rate of progress satisfactorily. 


My,1=31.90; yz; = 10.38. 


Average monthly ‘‘over all’’ school marks supplied from the registration 
office records, reduced to numerical values by the arbitrary scale: P+ =o; 
F—=1; F=2; F+=3; G— =4; G=5; G+ =6; E—=7; E=8; E+ =Io. 


Myy =6.08; Oy =1.43. 


The average of two independent rankings by the teacher, a week apart, of 
‘potential ability’ in typing. This was taken because of scarcity of other 
data and omissions of data in individual cases. The slip arrangement 
method was used; correlation of the rankings being p=.94++.01. The 
average rankings were changed into index numbers of 1 to 5, 5 per cent of 
the group being given the index 1 (lowest); 20 per cent given 2, 50 per cent 
given 3, 20 per cent given 4, and 5 per cent given 5 (highest). 


M,, =2.98; Oy =.91. 


These five variables were independently judged for relative 
importance disregarding o’s and signs, by two examiners, and a 
compromise weighting of each determined by discussion. 


I II III 1V Vie lorar 





FeXAMMUINEE Steak veretrcsd Vanesa rs: 5 3 2 4 6 20 
EEX AUIIN CMM ere Cetin ourelsueccteuere. 1 4 5 4 3 4 20 
Compromise yaar hn 4 5 3 3 5 20 


10 
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The equation for multiplying gross scores, accordingly becomes: 








DG tase Ey Gps ocd Xyt+K. 
15.46 10.18 10.38 1.43 OI 


X-= —.259X1— .491.X qj — .289X yy + 2.0908 X vt 5.482X y+ 40. 


As will be seen above, the larger the scores of the first three vari- 
ables the less the merit, so that these three variables receive a 
minus sign in the above equations. The constant term, 40, 
insures that the composite scores X, will all be positive quantities. 
These were then reduced to small numbers by means of a grouping 
table. 


THE BOOKKEEPING CRITERION 


Six variables enter into the bookkeeping criterion: 


I. Number of days spent in completing the first of the three divisions of the 
bookkeeping course. 
M, =38.26; 7;=20.65. 


II. Percentage school mark in the first division work, as recorded in the 
teacher’s class grade book. 


My, =91.01; 0; =4.55. 


III. Number of days spent in completing the second of the three divisions of 
the bookkeeping course. 


Myx = 50.33; Tyyy=19-77- 


IV. Percentage school mark in the second division work, as recorded in the 
teacher’s class grade book. 


My =90.38; Fy =5-35- 


VY. Number of days spent in completing both the first and second divisions of 
the bookkeeping course. This cumulative score was taken because of the 
number of instances in which it was necessary to estimate the days upon 
which a given student completed the first division and began the second, 
the lessons of the textbook not necessarily being taken by all the students 
in the chronological order of the text book. This variable is free from that 
attenuating factor. This represents, on the average, about five months 
practice in the bookkeeping course. 


M,, =88.81; Oy, = 36.96. 
VI. Average monthly ‘‘over all’’ school marks supplied from the registration 


office records, reduced to numerical values by the arbitrary scale: F =o; 
F+=1; G—=2; G=4; G+=5; E—=7; E=9; E+ =10. 


My, =5-475 Gyp=1.95. 
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The relative importance of the six variables estimated by the 
two judges are: 


I II LT eV: Ve VIS Toray 
RAGHU? Sra cade ob bee ac 2 5 2 5 B 3 20 
FERAMUnNeialee mets Mae ccss 2 3 2 4 B 6 20 


Compromiscuss sr en 2 4 2 5 B 4 20 


The equation for mu!tiplying gross scores accordingly becomes: 





2 
Kee XE Gp as 
20.65 4.55 19.77 5-35 
bale eest Sl Sea. 
36 96 1.95 
Xe= —.097X1+.880X 11 — 101 Xypy+.935X py — O81 Xy+ 
2.050Xyi+K. 


Variables I, III, V are given minus signs in the above equations 
since the larger numerical scores mean the less merit. A grouping 
table was used to reduce these scores to small numbers. 


THE GENERAL BUSINESS CRITERION 


The criterion scores in each of the three criterion variables were 
changed into multiples of the respective o’s of the three criteria. 
Use was made of Table XX XIII. 

Some individuals had two or even three criterion scores. Where 
such was the case, the two or three sigma scores were averaged for 
the final score of the individual. 

It will be seen that this procedure assumes that equal sigma 
positions in the three criteria mean equal merit in the general 
business ability criterion. Had a 
more refined technique been used, 


PAGE Se EEN such as could be used if it were 
Av. of Typ. total raw possible to obtain accurate meas- 
scores = 638.5. ures of the overlapping of each of 
these groups upon the others, mak- 

Av. of Sten. total raw ing allowances for such by adding 
scores = 623.1. credit to two of the criterion scores 
according to the interval of the 

Av. of Bkkg. total two larger successively above the 
a weecores = 004-0: one lowest, then higher correla- 





tions could be obtained than the 
final multiple 7 we have obtained. 
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TABLE XXXIII 
CRITERION SCORES 
Arar I 
General Criterion Component =(Score—M)— 
(oy 
TYPING STENOGRAPHY BOOKKEEPING 
= 3.48 2.82 3.83 
I 
== . 28735632 . 35460993 . 26109661 
o 
M= 6.51 9.03 9.71 
DEVIATIONS o-POSITIONS 
CRIT. yi b A aX bX x pee 
SCORE Score Score Score . 28735632 35460993 . 26109661 Soe 
—6.51 | —9.03 | —9.71 Typing Stenog. Bkkg. 
Typing | Stenog Bkkg. Component | Component | Component 
I —5.51 —8.03 | —8.71 —1.58 —2.85 —2.27 I 
2 —4.51 | —7.03 | —7.71 —I.30 —2.49 —2.01 2 
3 e— rh Te On OS Orig —=EAOF —2.14 SiS 3 
4 2 5b M5 OS tS ft Sad 178 —I.49 4 
5 Ser S| ee iOS en Ate ged =A —1.43 —I.23 5 
6 — .51 | —3.03 | —3.71 — .15 —1.07 — .97 6 
7 “49 |= 2'.03) | — 2.71 -14 — .72 — .71 7 
8 I.49 —I.03 —1.71 -43 — .37 — .45 8 
9 2549) || — 03) | —"a7t 97173 — .O0I — FO) 9 
10 3.49 -97 20) 1.00 -34 08 Io 
II 4.49 1.97 1.29 1.20 .70 -34 It 
12 5.49 2.97 2.29 1.58 1.05 .60 I2 
13 6.40 3.97 3.29 1.86 I.4I 86 13 
14 7.49 4.97 4.29 2235 1.76 1.12 14 
15 8.49 5.97 5.29 2.44 2.12 T38 15 
16 9.49 6.97 6.29 2.73 2.47 1.64 16 
17 10.49 7.907 7.29 3.01 2.83 I.90 17 
18 II.49 8.97 8.29 3.30 3.18 2.16 18 














ABPENDIXAAM 
NOTES ON THEORY AND TECHNIQUE 


THe MULTIPLE RATIO CORRELATION TECHNIQUE 


Test scores may be weighted before combining them into a 
composite scale score which will predict the criterion better than 
if weighting is not resorted to. The maximum correlation of the 
weighted composite with the criterion is secured by weighting 
each test proportional to its partial regression coefficient. The 
partial regression technique is so laborious that its use is known to 
but few. The returns, in increased validity of a scale due to 
weighting the tests by this method, have frequently seemed small 
in comparison with the labor involved in securing them. The use 
of the method has been criticised by many on the grounds that, 
with the number of cases which one ordinarily employs in an 
investigation, the partial correlation coefficients, and conse- 
quently the partial regression coefficients, have high P.E.’s. 
This objection is unfounded, however, since the multiple corre- 
lation coefficient (or correlation coefficient which expresses the 
validity of the combined or weighted scale) always has a smaller 
unreliability than any one of the individual partial correlation 
coefficients. An average is a very stable central value obtained 
from a widely varying number of individual components, the 
gross scores; in the same way, the multiple correlation coefficient 
is a rather stable value derived from a number of more unreliable 
components. 

After one has a scale of a few tests, the addition of many more 
tests adds ordinarily but little to the efficiency of the shorter scale. 
There has been no way in the past whereby one can pick out the 
most efficient set of tests for combining intoascale. After having 
approximately determined the partial regression weights of a 
number of tests, Kelley eliminates successively the tests of lowest 
partial regression weights, determining after each successive 
elimination the correlation of the remaining composite, as 
weighted, with the criterion. Kelley has devised formulae for 
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readily determining the combination! correlation coefficient, as 
each test is successively given a weight of zero, which thus 
eliminates it. This procedure necessitates the solution of all the 
possible intercorrelations of the tests and the determination of 
either the true or the approximate regression equation, a pro- 
cedure which is very laborious when many tests are involved. A 
different procedure has been used by Rosenow, who assumes that 
those tests which have the highest partial correlation coefficients 
of the mth order are the ones which will combine to best advantage. 
Consequently, in his work with tests for predicting college marks, 
he determined the partial correlation coefficients of the fourteenth 
order and then combined the five tests having the highest four- 
teenth order partial correlation coefficients to make a scale for 
predicting college marks. This procedure is probably statisti- 
cally‘less sound than that used by Kelley, and the procedure is 
much less systematic. 

The new multiple ratio correlation technique enables one to 
determine the 1 best tests to combine into a scale, selected from a 
larger number of tests, ’. Where an adequate criterion is avail- 
able, the method thus allows one with minimum effort to deter- 
mine the » ‘‘major causes”’ from the  ‘‘causes”’ investigated. 
When these 1 tests are weighted with the multiple ratio regression 
weights, .the efficiency of the weighted scale in predicting the 
criterion is given by the multiple ratio correlation coefficient. 

In its use in scale making, that test which yields the highest 
correlation with the criterion is taken for the ‘“‘backbone”’ test of 
the scale. Each of the remaining tests is then investigated, by 
means of the appropriate equation, to determine which test, when 
added to the tests already in the scale, will make the weighted 
composite of two tests correlate highest with the criterion. To 
this test is then added that one of the remaining tests which will 
make the new composite of three tests correlate highest with the 
criterion, and soon. In this way the scale is built up rather than 
torn down, as in Kelley’s procedure. With only twenty or so tests 
to choose from, the multiple ratio correlation coefficient (close 


1 The word “combination correlation coefficient’’ is here used to represent the 
correlation between the criterion and a combination of tests which is weighted with 
a series of weights which yields other than the true multiple correlation coefficient. 
In the above case, combination correlation coefficients are rough approximations to 
the multiple correlation coefficients, at least at the beginning of the series of suc- 
cessive eliminations of tests with the lowest partial regression coefficients. 
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approximation to the true multiple correlation coefficient) after 
the inclusion of a few tests in the scale approaches the magnitude 
which it would have with all the tests in the scale. Once the tests 
are arranged in order of decreasing amount of contribution in 
predicting the criterion, and the appropriate multiple ratio corre- 
lation coefficients after the inclusion of each successive test are 
known, one may consider the scale complete just as soon as the 
increase in the multiple ratio correlation coefficient seems not to 
justify the labor involved in the administration and scoring of an 
additional test. 

The technique, the discovery of which was motivated by the 
need for a rapid, systematic method of weighting tests, has grown 
up by a series of successive discoveries of formulae extending 
through the year of the investigation. Consequently, at the 
time when its use was first undertaken in this problem, the final 
elaboration was not available. It was therefore impossible to 
secure the greatest value of the technique in this investigation. 
Its use has been much simplified by the preparation of printed 
form charts so that the selection of the five best tests out of, say, 
fifteen, is but the work of a few hours at the present time. With 
the revised technique it is necessary to solve but a few of the 
possible intercorrelations of the tests if one is selecting the n best 
tests out of a larger number. 


A NOTE ON SCORING FORMULAE 


Where true-false tests are given, the common practice is to score 
the test by the scoring formula: score equals rights minus wrongs. 
The procedure is justified, even on the part of those who profess to 
disbelieve in the practical value or statistical validity of using 
partial regression equations for determining the weights of in- 
dividual tests, on an @ priori assumption that, inasmuch as one 
may presumably get half of the questions right by chance, he 
should be discounted an amount which would (1) give a score of 
100 per cent to the person who has no errors and (2) give a score of 
zero to a person who gets 50 per cent of the questions correct. If 
the question were one of mere chance, as here assumed, the scoring 
formula above would adequately take care of the situation. But, 
the situation which would be chance in a dice box is not neces- 
sarily at all psychologically a chance situation. As a matter of 
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fact, if it be assumed that a person is totally ignorant of the proper 
answer to a given question, he is frequently predisposed to under- 
score the one answer rather than the other by some psychological 
element in the question. Frequently this same hypothetically 
ignorant individual could be made to underscore the opposite 
answer to the one which he did first by the simple expedient of 
changing a word with positive intent so as to become one of 
negative intent. 

Wherever such scoring formulae are used, the implicit assump- 
tion is that the use of such a scoring formula will yield better 
correlations with the criterion to be predicted than if the scoring 
formula were not used. If a valid criterion is available it is a 
simple matter to determine the correlation with the criterion 
which would result from the use of any given scoring formula. 
Let Rights always be weighted 1.00 in the scoring formula, and 
let errors be weighted an amount, C, which may be either a 
positive or negative quantity of any amount. 

Let 7;p be the correlation between criterion (J) and the Rights 
(R). 

Let r;w be the correlation between the criterion and the 
Wrongs (W). 

Let rrw be the correlation between the Rights and the Wrongs. 
Then the formula which gives the correlation of the criterion with 
the score, S=R+C.- W, is given by the formula: 


= YrR: OR+Nw: C- ow 
(er a) 5 
oR’ +C?- oy’ +2rrw: or: C- ow 


It should not be forgotten that from the statistical point of view, 
the scoring formula S=R is just as truly a scoring formula which 
gives Wrongs a perfectly definite weight as any other of the very 
many scoring formulae which might be adopted. When we use 
this formula we are implying either that (1) the Wrongs have a 
partial regression contribution of zero or that (2) the contribution 
is so slight as not to justify the use of a scoring formula, in which 
case C may be said to be zero when taken to the nearest integral 
value. 

Where a criterion is available, other weightings for errors than 
a discount of 1.00 each are the rule rather than the exception in 
true-false tests. In the same way, there is no more justification 
for using an arbitrary scoring formula on recognition tests involv- 
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ing three or more alternatives than there is in the case of the 
true-false tests. 

The most helpful point of view to be taken in regard to weight- 
ing tests or in using scoring formulae isa rather simple one. This 
point of view is that there is no right nor wrong way to weight 
tests, elements of tests, or to give relative weights to speed and 
accuracy; that with a specific purpose in mind, we should use 
those weights and those scoring formulae which will give us the 
maximum predictive value for the minimum of effort. As a 
matter of fact, the scoring formula S=R+C.- W is but a very 
crude approximation to the intricate scoring formula, involving 
various exponential functions of the gross measures, which we 
might have. At the present stage of our tests, few people would 
think of devoting their time to the calculation of exponential 
scoring formulae. Such weighting of tests, test elements, speed 
and accuracy may possibly mark the development of a more 
complicated mathematical procedure in tests once we have criteria 
adequate enough to justify such refined technique. For that 
matter, it may be true that even now a sliding scale system of 
weighting such factors would yield returns of sufficient value to 
justify their use. It is true that partial regression equations 
based on assumptions of linearity (that is, that the weight of a 
gross score X is a constant whatever the value of X) are the 
simpliest, or limiting, case of more complicated curvilinear 
functions. 

The use of any scoring formula, the simplest of which is S=R, 
makes then the same implicit assumptions (although empirically 
not ordinarily so very bad ones) as are made when test makers add 
the gross scores on all tests often with the naive faith that thereby 
they are giving ‘‘no undue weight to any one test”’ believing that 
they are thus keeping out of error by their excessive conservatism. 
It frequently happens that the scoring formula S=R gives accept- 
able correlations and it also frequently happens that weighting 
tests by adding the gross scores (which practically never weights 
the tests equally) also yields very good correlations with the 
criteria at hand. There is no scientific merit whatever to the 
statement, ‘‘In the absence of an adequate criterion, I preferred 
not to weight my tests.’’ If any scores are given, weights are 
given. The method of giving valid arbitrary importances to 
variables is discussed under the criteria of the C—1 Clerical Test. 
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THE RELIABILITY OF SELECTION OF 140-QUESTION GENERAL 
TRADE TEST FROM A LIsT OF 204 QUESTIONS 


The general trade test of 204 questions which was administered 
during the summer school of 1920 to several hundred soldiers was 
reduced to 140 questions largely on a subjective basis of the 
known difficulties in scoring and the likelihood of the retained 
questions being an adequate sampling of the different trade 
groups in which the questions were classified in the revised 140- 
question set of the army general trade test. The scores on the 
140 questions of I9I men in the summer school who had taken the 
longer form were correlated with the total scores on the longer 
form, yielding a reliability coefficient of .978+.003. It is a 
question, of course, whether such high reliability would be secured 
upon ‘a second giving of the 140-question set; but the fact that 
the correlation coefficient is of the magnitude which it is, demon- 
strates that the general trade test, presumably a good measure of 
interest in mechanical things, has a very high reliability. 


THE RELIABILITY OF THE ONE-WoORD-ANSWER TRADE TEST 


After the summer school students in vocational courses at 
Camp Grant, Lll., in 1920, had completed their six weeks of 
summer school instruction, they were given final examinations in 
the one-word-answer form covering their respective courses. The 
reliabilities of this form of examination are shown below, com- 
puted by correlating the scores on the odd-numbered questions 
with the scores on the even-numbered questions. These corre- 
lations are uniformly high. The column headed WN gives the 
number of cases on which the reliabilities of the various exami- 
nations are based; that headed 11; gives the reliability coefficient 
for the odd-numbered with the even-numbered scores; that headed 
Yo gives the reliability, by Brown’s formula, of Form A with 
Form B of the examination of the present number of questions, 7, 
in each case; that headed 7500) is the reliability by Brown’s 
formula of a 50-question set of the present type of examination 
with another 50-question set, that is, taking enough multiples of 
the present 7: examination in order that there would be 50 
questions in the set in place of the number which is now given in 
the second column of figures of the table. 
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TABLE XXXIV 


RELIABILITY OF ONE-WoRD ANSWER TRADE TEST FINAL EXAMINATIONS 
GIVEN TO THE SUMMER SCHOOL STUDENTS, CAMP GRANT, 1920 


























No. oF 
Ay. 

EXAMINATION are QUESTIONS| NV ni Yo 150 (50) 
lectriciany ah) sano 10.0 40 34 .699 | .823 854 
Wiieldeiapyiysante aye, 16.4 40 8 -720 | .837 .865 
IMotorcyclisthe en. = - 10.6 38 10 -729 | .843 .876 
Eltimbers stanton aac hs. bil 40 16 761 | .864 .888 
Wrattsmanms pre eee ie E 45 14 -780 | .876 .887 
Engine Mech......... 20.0 49 100 .885 | .939 .940 
NAChInISt pease |i 7..9 52 30 .898 | .946 .944 
Storage Battery....... 6.8 18 on .801*| .890 .957 
Prognosis Tests: 

Gri@irades) araerncee 51.5 204 102 .886 | .940 .792 
6B (Scattered)...... 58.2 204 102 .919 | .958 .847 


* Spuriously high, too many zero scores. 
ri, is reliability coefficient of half of the test against the other half. 
2ri1 


Sab 





122 


IOS eer ree ee 
I+ 2 il 2 Veh 
nN 


The unusually high reliability of such trade test forms of 
questions points to the advisability of using this type of exami- 
nation wherever it is possible todo so. It may readily be applied 
to any form of vocational interest test, where it will in all proba- 
bility measure interest better than the true-false type of exami- 
nation. One may more surely be said to be interested in a 
subject if he can recall something about it (that is, remember it 
to the point of recall) than if he merely can recognize it upon its 
presentation. This a priori argument holds, of course, only for a 
single item of information. The respective merits of the one- 
word answer form of question and of other forms of test reaction 
is only to be settled in terms of the validity of the respective tests. 
It is not even to be settled on the grounds of reliability. If the 
test has high enough validity, we do not mind whether it is as 
reliable as it might be or not. A test which is unreliable can 
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always be improved by lengthening the time, lengthening the 
number of elements, improving the directions, deleting obscure 
phrases and questions, or, as recommended by Herring, making a 
detailed psychological analysis of each question based upon the 
introspective reactions of subjects who have taken the test, there- 
upon rewording the element until it secures the desired response 
from those of superior ability. 


Tue RELIABILITY OF THE UNREVISED MECHANICAL INTEREST 
TEST 


The original edition of the Mechanical Interest Test was de- 
vised by Dr. Edgar Rice. It contained forty-six questions, some 
of which had as many as four tools required as an answer, while 
eight \were of a different type, one-word-answer questions in 
answer to the general question, ‘‘What is the name of tool No. 
14e.”’ The reliability by the odds-evens method was computed, 
the subjects being the 272 pupils enrolled in the Camp Grant 
Summer School, which included those studying machine shop, 
electrical, automotive, men and women teachers, and business 
students. The reliability coefficient of the odd-even scores is .80. 
By the application of Brown’s formula, Form A correlates with 
Form B of the same test of 46 questions to the extent of .89, which, 
reduced to our common basis of 50 questions on Form A with 
Form B, yields the high reliability of 7500) =.894. This relia- 
bility was subsequently increased to .923 by care used in revision 
of the test by eliminating or changing the wording of those 
questions whose meaning was obscure, by eliminating entirely 
the eight questions of the totally different type which required a 
different mental set, and by the adoption of the scoring formula 
which yields highest reliability. Such revision always results in 
increased reliability of the test. It also tends to increase the 
validity of the test, although not in proportionate amount. One 
may usually select » questions from a large number, 7’, which will 
statistically yield a much higher validity. It is not known 
whether this statistically attained validity would persist upon re- 
examination by means of a test consisting of the selected questions 
only, although there is no reason for believing that the increase 
would disappear entirely or even to half its extent. The same 
assumptions are made regarding separate tests taken from a 
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longer scale and given differential weights. Weighting of the 
answers of individual questions may be resorted to. Thus far, 
test makers have investigated this possibility but little further 
than to eliminate those questions which have a low or negative 
correlation with the criterion. 


THE SCORING METHOD OF THE REVISED MECHANICAL INTEREST 
TEST 

This investigation is interesting from the point of view of the 
technique involved. The Mechanical Interest Test asks at the 
beginning ‘‘ What tools are used [followed by individual numbered 
situations such as,]| to saw up railroad ties into firewood lengths?” 
In some cases the answer will be the number, taken from the 
pictures, of a single tool, and in some cases the numbers of two 
tools. There are thus three possible ways in which scores might 
be computed. 

A. Count one error for each number not correctly given in the 
brackets as indicated below, one credit accordingly being given 
for each number which is right. This would mean that questions 
requiring two answers would receive two points credit and ques- 
tions having only one number as answer would receive only one 
point credit if entirely correct; and one error would be counted for 
each number incorrect. 

B. If either or both answers to a double answer question are 
incorrect, score it zero, and score it 1 only when both are correct, 
giving one credit to the correct one-answer question if correct. 

C. Give one credit each to each of a double-parenthesis answer 
as in (A), or a total of two credits to a double-parenthesis answer 
and likewise two credits to a correct single-parenthesis answer. 
This amounts to giving each question two credits and using partial 
credits in the case of double-answer questions. 

This technique assumes, in the absence of a better criterion, 
that the best scoring method is the one which has the highest 
reliability. 

The failures by sentences and by test subject were plotted, 
enabling quick computation of scores by the three scoring meth- 
ods. A stencil form greatly hastened the speed of the original 
entry. For future work of this kind, prepared cross-section paper 
will be found to be very helpful. 

The following reliabilities result: 
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Scorinc Metuops of MECHANICAL INTEREST TEST. (E.R.T.-5.) 
N= 223 cases 








For Errors | TAKE Orr | RELIABILITY | RELIABILITY OF 
AS INDICATED POINTS OF HALvEs | WHOLE TEST 





APE SCOLITIZ recone of (1) (1) 2 f= .770 .876 
(1) I 


Either wrong 











Be ecomice ore @) ( ) I 
( ) I t= .808 .896 
C. Partial Credit 
Scomng nee (1) (1) Fe 
(2) m= .849 +912 


4 


The results, taking as the best scoring method the one with 
highest reliability, show that a partial credit scoring method of 
two points for each question correct and a partial half credit of. 
one point for one of two parentheses correctly filled out is best. 

The adopted method is then the partial credit method, where 
Ynny Horm A with Form B, is .912. Reduced to a 50-question 
basis, %nn50=-923. This figure is to be compared with .894 of 
the wnrevised 45-question set of this test. 

The following two tables show the high reliabilities, by the 
odds-evens method, obtained for two tests extensively used in the 
E and R Schools in measuring mechanical interest and amateur 
knowledge of mechanical things. 


TABLE XXXV 


MECHANICAL INTEREST TEST RELIABILITIES 

















ForM Ago 
No. or | No. oF Form A WITH 
EDITION QUESTIONS| CASES mi Ann B Form Boo 
122 Tnns50 
Rice Form (Unrevised) . 46 271 .796 886 .894 
1920-1921 Revised Edi- 
LiODe ee saa cee: 45 223 .849 .920 .928 


1921-1922 Second Edi- 
TION ee iN rnc ay ee 26 198 SF to .836 -906 
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GENERAL TRADE TEST RELIABILITIES 











No. OF No. OF 





EDITION Questions} CasEs mu fe fe, 

6A (arranged by* trade 

PLOUPS) ee yeni eye oe 204 102 .886 .940 .792 
6B (scattered)f........ 204 102 919 | .958 .847 
Revised 140-Set (1920- Correlation with 6B 240-ques- 

Q2 Mh) eevee Pies ees 140 Ig! tion form= .978 
Revised 40-Question Set 

(1921-1922) (Scaled). 40 196 .792 884 .905 
Revised 50-Question Set 
(hO27=1G22) heen ine 50 


* All blacksmith questions arranged together, all carpenter, etc. 

7 Questions from various trades arranged in random order. Since the reliability 
of the random order is highest, it might seem that a random order is most desirable 
in interest questions; this value is somewhat offset by the desirability of knowing the 
field of the test subject’s greatest interest. 


THE EFFECT UPON VALIDITY OF DOUBLING THE LENGTH OF A TEST 


Let 77; be the correlation of the criterion and the first giving of 
the test. 

Let 772 be the correlation of the criterion and the second giving 
of the test. 

Let 7,z be the reliability coefficient of the test. 
Now, 77 is approximately equal to 772, or it may be assumed as 
equal. And the sigmas are equal in two such alternative forms 
of test. 

The gross score weight of Test 2 with respect to Test 1 when its 
gross scores are given a weight of 1.00 is given by the formula 


02 
W.= = Boyt, 


o1 


in which Bo, =—2— 1", 
Fifi SUsey? Use, 
Now since 77;=772, Be/;=1.00; and since o.=01, it follows that 
W,= 1.00. 

This means that if we add the gross scores of Form 2 to the 
gross scores of Form 1, we shall have a test just twice as long as 
formerly and shall be weighting the two forms equally by the 
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partial regression equation. If the variabilities are not equal, we 
may take W,=1.00, and W,=%, in which W, and Wy are gross 

02 

score weights. If the two forms are not of identical difficulty, the 
different averages in themselves do not affect the formulae below; 
however, tests with markedly different average scores are likely 
not to fulfill the necessary assumption above that ry =772. 

In any case, whether o1=o2 or not, if the partial regression 
equation be used for weighting Form 2 with respect to Form 1, 
the correlation with the criterion of the combined Forms 1 and 
2, each being weighted with true partial regression importances 
of 1.00, is given by the following formula: 


n 7 _ trie —21n - fe: he 
1(1+2) = ee . 
nae e 





Substituting 77; =7;2 in expressions containing 772, and factoring 
the denominator, 
(2 - 773") — (27m?) (N12) 

(1 —112) (1 +112) 


1-112 
Y =V2-1r py a ci 
1(1+2) iit epee ee 


1 
T+) = V2 - tN Gap 


12 


L(G) = 





The correlation with the criterion of two multiples of a given test 
(when weighted by the regression equation) is, 


1 
71(14+2) =V2 e We ‘ 
12 


tatty 








It is probably better to take 7, = 


When the reliability of the test is 1.00, 77449) =7, or there is 
no value in giving a second test if the scores on the second test 
are identical with the first; when ry is 0, 77442) = V 2-11; when 
ry is negative 77(149) is even larger than V2-r7,._ It is of course 
highly improbable that one could construct a test in which both 
17 =172 and 71.=0, Or 712<0; it is of course possible when 7); is 
almost zero, but in that case the test is too invalid to be of prac- 
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tical use anyway. In the above formula, 712 may be taken as the 
correlation of odds-evens, 77; the correlation of odds with crite- 
rion, 772 the correlation of evens with the criterion. Successive 
applications of the above formula may then be made, in each case 
calculating the value of r1. by the formula, 7’3.= ee in which 
12 

r'\. is a reliability coefficient of the double of the previous test, 
which previous test has a reliability, 72. In this way, one may 
determine the correlation of 2, 4, 8, 16, 32, 64, etc., multiples of 
the present halves, or wholes, of the present test, and plot a curve 
of multiples and time, both used as abscissae, showing the in- 
creasing validity to be secured by longer tests. A resort to this 
formula should decide whether, for the results secured, it is better 
to lengthen the test or to try out different test content. 





A Plot for Solving the Formula, 71,42) = V 2-411. 5 5 
1+ 1712 


Since this equation may be written 


2 
naw= (4 er 
12 


the family of curves for representative values of 712 may be drawn 
as straight line curves in which abscissae are 77;; each line of slope, 











m= is named for the 72 from which it is derived; and 


1+7.2 
ordinates are the sought values of 7;(142) or correlation of two 
multiples of the present test with the criterion. 


THE DETERMINATION OF THE LIMITS WITHIN WHICH VARIABLES 
MAY BE CORRELATED 


It is frequently desirable to know the limits within which vari- 
ables may be correlated with each other as dependent upon 
existing relationships. As an example, consider the problem: If 
tests of ‘‘mechanical aptitude” correlate to an extent of .60, say, 
with a certain mechanical criterion, and these same tests correlate 
with general intelligence to the extent of, say, .50; then what 
are the limits within which we must locate the correlation of 
the mechanical criterion and intelligence; in other words, how 


11 
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little and how much can intelligence be related to mechanical 
ability? 

The following diagram is very helpful in visualizing the relation- 
ships involved in the formulae: 


Tic 


In the above case, J is the criterion, A the mechanical test, and 
B the intelligence test. 
By the conditions, 774 =.60; 

Tan=-50; 


and it is required to know the limits within which 77g may vary. 
These are found by certain manipulations of the formula for 
multiple correlation. The multiple correlation coefficient, used 
in determining the efficiency of a partial regression weighted 
scale of A and B in predicting the criterion, is given by the 
formula, . 


= attest o2t * T1B° TAB 
Wi Ga SS SSS SE 


I —T AR’ 


Obviously with the values of any three of the variables of the 
above equation given, we may solve for the value of the fourth; 
it may also be advantageous in certain cases, with two of the 
values of the variables given, to plot the curve showing the 
dependence of the third upon the fourth, whereupon for any 
seemingly plausible value of the one we may secure from the curve 
the value of the fourth. This latter procedure would be of value 
in such a problem as ‘‘Given 7;4=.60, what is the maximum 
value of 773 as dependent upon the correlation, 743, whose exact 
value we do not know?”’ 

The limits within which 773 may vary depend for their solution 
upon two theorems: 

A. The maximum value of r7z is found when A and B together 
predict the criterion perfectly, i.e., when 77~ =1.00. 
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B. The minimum value of rzg is found when B predicts no 
element of the criterion not already predicted by A; that is, when 
’r7a 1S not increased in size by the combination with it of B; or 
finally, when r7¢ =77,4. 

By use of the above two formulae in our problem, we see that 
the maximum value of r;g is found by solving for r;g the equa- 
tion: 

rooe (.60)? +177,” — 2(.60) (778) (.50) 
1—(.50)? 
whence, 7r7z = .993. 

It is rather disconcerting to know that, with the relationships 
given (which are in about the magnitudes found in some of our 
paper mechanical tests), B must correlate to the extent of .993 
with the criterion in order that A and B combined will predict the 
criterion perfectly. 

The minimum value of 7;z is found by use of the second theo- 
rem, by solving for 77, in the equation, 


p= EES 
| 1— (.50)? 





whence, 773 = .30. 

A special case of interest in solving for r7z as dependent for its 
value upon r4p is the case where 743=0. If, as above, r74 =.60; 
then when 7;~ =1.00 and when 743 =0, the value of r;4 indicates 
the ‘correlation with the criterion of all other factors totally 
unrelated to A, the mechanical test.” 

Thus, 








Lae Acedia ea) (rrp) (0) 
1—(0)? 
whence, in this special case, 


ay V 1 —(.60)2=-80. 


= _ pees 
Gigi She = 


It is rather startling to discover that, with a test which corre- 
lates .6 with a criterion, ‘‘all other totally unmeasured factors”’ 
correlate with this same criterion to the extent of .8. If the 
factors which we might add to our present test, A, to perfectly 
predict the criterion, do correlate positively to any extent with A, 
then these other factors will correlate even more than .8 with the 
criterion. 
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Another case of interest is to assume 7;4=7;g and r4g=0, 
when 7;~7'=1.00. Solving for 774 in the formula, 


1—(0)? 
VA ee V.50 =o /O7- 


Or, two tests totally unrelated to each other (o correlation between 
them) may yet each correlate to the same extent, a maximum 
correlation with a criterion of .707. In popular language, a 
‘“‘genius’’ on the one test is as likely as not to be rated “‘idiot”’ on 
the other, and yet both tests correlate with the criterion to the 
extent of .707. 

We have just proved that Test V (specific vocational aptitude 
test) and Test S (specific school aptitude test) may correlate zero 
with each other and may yet correlate equally to the maximum 
extent of .707 with a valid vocational criterion. If now, as a 
practical school procedure, we were to consider for a vocational 
course all those who are below average in Test S, and who pre- 
sumably are either doing failing or else poor work in school, we 
would reduce considerably the range of ability left in the academic 
work for future progress. Dr. Ruger! has shown that the 
standard deviation of the reduced group is given by the formula, 


on =/F=? = .60 
2? 7 


when the standard deviation of the entire distribution is con- 








oO 
sidered to be 1.00. That is, the ratio > =.60 in Dr. Kelley’s 


formula, 


Solving, 7=.514. 

In other words, the elimination from the academic school of all 
people below the average in Test S reduces the maximum possible 
correlation of Test S (“‘intelligence’’) with the vocational criterion 


1 We desire to express our thanks to Dr. Ruger for the derivation of this formula. 
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’ 


to .514! in the group of “‘survivors.’”’ At the same time, those 
eliminated and now taking up vocational work will still correlate 
.707 with this same vocational criterion; this correlation is not 
affected by the division of the group made on the basis of a test 
with which Test V correlates zero. 


1 However, those going on for more academic work are of homogenous and 
superior academic ability. 
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INDEX 


Accountants, tests of, 106ff. 

Administrative Directions: Arith.- 
Re., 111; C-r clerical test, 111ff.; 
C-2 clerical test, 116ff.; I.E.R. 
Assembly, 123ff.; Stenquist Assem- 
bly, 118, 119; Stenquist Picture, 
120,121. 

Alpha, Army Intelligence Test, 8, 9, 
24, 32, 33- 

Arithmetic Problem Solving Test, 11, 
13. 

Arithmetic tests, absence of practice 
effect, II. 

Arith.-Re. Test: advantages of, 11- 
12; comparison with standard in- 
telligence tests, 15, 16; correlations 
with intelligence, 14ff.; description 
of, 11; directions for administering 
and scoring, III; use as intelligence 
test, 12; weighting of, 12, 13. 

Army Alpha, correlation with Sten- 
quist Assembly, 24; relationship to 
high school success, 8, 9. 

Army Clerical Tests, 78, 82. 

Army Trade Test Division, 1. 

Assembly Tests (see Stenquist As- 
sembly Test; I.E.R. Assembly 
Test). 

Automotive Courses, 
success in, 36, 37. 

Auto repairmen, tests of, ro. 


prediction of 


Best Tests, determination of, 137ff. 

Bids, distribution in criteria, 131, 132. 

Bookkeeping, correlations with suc- 
cess in learning, 74, 75; criteria of, 
63, 134, 135; efficiency of tests in 
predicting, 80, 85ff.; prediction of 
success by mechanical tests, 36, 37; 
preliminary tests of, 63ff.; tests of, 
106. 

Boys and Girls, intercorrelations of 
all tests, 22 insert. 

Boys, mechanical ability, 17ff.; re- 
sults tested with Girls’ mechanical 
test, 42ff. 

Buffer Test, 89. 

Business College, criteria of success, 72. 

Business Tests, 63ff. 


C-1, administered to University 
students, 106ff.; clerical test, de- 
scription, 13; constants for, 87; 
correlations with intelligence, Laff: 


correlations with success in clerical 
work, 9off., rooff.; determination of 
nine “best” tests, 78ff.; directions 
for administering and scoring, I1I- 
116; gross score weights, 89; group 
differences in, 106ff.; group norms, 
107; hierarchy of test scores, 107; 
reliability of, 89; results in Army 
business school, 85; results in Com- 
pany I, 9off.; results in Company 
O, 102; results in Company W, 
100ff.; results with school children, 
g6ff.; table of constants for, 83; 
weighting of, 89. 

C-2, correlations with intelligence, 
14ff.; correlations with other tests, 
98; derivation of weights, 97; de- 
scription of, 97; results in Company 
W, rooff.; scoring of, 117ff.; weight- 
ing of, 97. 

Clerical ability, prediction of ability 
of routine clerks, goff.; relation to 
intelligence, 109. 

Clerical tests, Army, 78, 82. 

Clerical tests, gross score weights for 
differential predictions in sten- 
ography, typing, and bookkeeping, 
63ff.; intercorrelations of, 65. 


Clerks, tests of, 1ooff., 102ff., 106ff. 

Coaching, elimination of, I1. 

Cobb, M. V., 8. 

Combination correlation coefficient, 
138. 


Correlations with criterion of two 
unrelated tests, 152. 

Criteria, construction of:  steno- 
graphic, 129-133; typing, 133-134, 
bookkeeping, 134-135, general busi- 
ness, 135-136; correlations with, in 
case of unrelated tests, 152; deriva- 
tion of individual’s criterion score, 
132; distribution of bids to factors, 
13I, 132; equations for weighting 
factors: stenography, 132, typing, 
134, bookkeeping, 135; weighting 
the factors in: stenography, 130, 


typing, 133, bookkeeping, 135; 
weighting factors negatively, 131, 
134, 135. 


Criterion, 3, 4; army business school, 
85, 86; bookkeeping, 63ff.; busi- 
ness college success, 72, 73; desir- 
able clerical criterion items, 90; of 
Company W, rooff. 
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Criterion, clerical, effect of experience 
on, 93. 

Curtailment of a distribution at the 
average, 152, 153. 


Differential gross score weights of 
clerical tests, 87. 

Differential weights in predicting 
clerical success, 81; in utilizing all 
test scores, 81ff. 

Directions for administering tests: 
Arith.-Re.,. 111; C—1, 111f.; C2, 
116ff.; I.E.R. Assembly, 123ff.; 
Stenquist Assembly, 118, 119; 
Stenquist Picture, 120, 121. 

Doubling test, effect on validity, 
147ff.; formula for increased valid- 
ity, 148; plot for solving formula, 


149. 


Elementary school, function of, in 
vocational guidance, 4. 

Elimination, of coaching, 11; from 
high school, 8, 9. 

Engineering tests, 8. 

Evaluation of fathers’ occupations, 
120, 127) 

Examinations in trade courses, 143. 

Executives, tests of, 102. 

Experience, effect on clerical crite- 
rion, 93. 

Exponential weights, 142. 


Fathers’ occupations, evaluation of, 
126, 127.) — 

Final examinations in trade courses, 
143. ‘ 

Follow-up, desirability of, 2-3. 

Foreign training class, tests of, 102ff. 


General business ability, criterion of, 
135-136. 

General trade test: correlations with 
Army Alpha, 32, 33; correlations 
with mechanical tests, 17ff., 32, 38; 
prediction of success in trade 
courses, 36ff.; reliability of, 147. 

Girls and Boys, intercorrelations of 
all tests, 22 insert. 

Girls’ Assembly Tests: directions for 
administering, 121, 122; directions 
for scoring, 122ff.; directions for 
re-assembling, 123ff. 

Girls, mechanical ability of, 4off., 109; 
mechanical interest tests, 46-62; 
results when tested on _ boys’ 
mechanical tests, 45. 

Graphotype operators, tests of, 7. 

Group differences in C-1 clerical test, 
106ff.; in Stenquist Assembly Test, 
100ff. 


Index 


Haggerty Delta-2, correlations with 
intelligence, 14ff. 

Half-year gains, 
intelligence, 14ff. 

Hierarchy of test scores in clerical 
group, 107; in mechanical group, 
110. 

High school, elimination from, 8-9; 
graduation, intelligence prerequi- 
site, 8; minimum I.Q. required, 9. 

High school population, intelligence 
of, 8; mechanical ability of, 1off. 

Hostlers, tests of, 10. 


correlations with 


I.E.R. Assembly Tests: correlation 
with intelligence in the case of 
boys, 44, of girls, 46; correlation 
with mechanical tests, 17ff.; cor- 
relation with mechanical tests in 
case of boys, 44; correlation with 
Stenquist Assembly, 22; difficulty 
of elements, 41, 43; directions for 
administering, I21, 122; directions 
for assembling, 123ff.; directions 
for scoring, 122ff.; distinctness of 
ability which it tests, 46; prelim- 
inary try-out, 41ff.; results of ad- 
ministration to boys, 42ff.; results 
of administration to girls, 45. 

Improvement of reliability of a test, 
143, 144. é 

feats scope of, 1; specific purpose 
Ob: 

Intelligence: contribution to a voca- 
tional test, 97, 98; correlation with 
clerical ability, 14ff., 96, 97 foot- 
note, 107; correlation with Hag- 
gerty Delta—2, 14ff.; correlation 
with I.E.R. Assembly test, 44, 46; 
correlation with Stenquist Assem- 
bly test, 24, 33; correlation with 
trade tests, 10. 

Intelligence measures, desirability of 
zero correlation with vocational 
tests, 152, 153. 

Intelligence measures, 
tions of, 14. 

Intelligence, relation to clerical abil- 
ity, 97 footnote. 

Intelligence tests, desirability of elim- 
inating practice effect, II. 

Intercorrelations: maximum value of, 
150, I51; minimum value of, 151; 
of all tests of boys and girls, ages 
12-15, 22 insert; of clerical tests, 
65; of intelligence tests, 14; of 
mechanical tests, 17ff.; of tests in 
Company O, 105; size of, in intelli- 
gence tests, 22; size of, in mechani- 
cal tests, 22. 


intercorrela- 


Index 


Interest tests of boys, 17, 22, 32, 33, 
35; of girls, 46-62. 
I.Q. required for H. S. success, 9. 


Kelley, Dol10, 121375138, 
Kitson, H. D., 127. 
Knight, F. B., 9. 


Lengthening a test, effect on reliabil- 
ity, 143, 144; effect on validity, 
147ff. 

Letter school marks, transmutation 
of, 65. 

Limitations of mechanical environ- 
ment, Ioff. 

Limits, within which variables may 
be correlated, 149. 

[eink hae atte 


Machinist course, correlations of pro- 

ficiency with students’ and teach- 

ers’ ratings, 37ff.; prediction of 

success in, 36, 37. 

Machinists, tests of, 10. 

Manny Ca Res: 

Manual Training Test, 126. 

Martin, E. M., 11. 

Master summary of intercorrelations, 

22 insert. 

Materials needed for vocational guid- 

ance tests, 128. 

Maximum value of intercorrelations, 

150, I5I. 

Mechanical ability, differentiation of 
sexes, 40; distinctness of, 23, 24; 
improvement by practice, roff., 37, 
46; students’ estimates of, 37, 38. 

Mechanical environment, limitations 
of, in case of city boys, 19, 20. 

Mechanical interest, basis of measur- 
ing, 46, 47. ; 

Mechanical interest test: correlations 
with Army Alpha, 32, 33; correla- 
tion with General Trade, university 
group, 22; correlation with mechan- 
ical tests, 17ff.; determination of 
scoring method, 145ff.; of girls, 
46-62; prediction of success in 
trade courses, 35ff.; reliability of, 
144, 145. ; : 

Mechanical tests: correlations with 
intelligence, 23; intercorrelations 
of, 17ff., 21ff.; relationship of paper 
tests to performance tests, 25; size 
of intercorrelations, 22. 

Mechanical tests, description (see 
I.E.R. Assembly Test, Stenquist 
Assembly Test, Stenquist Picture 
Test). 

Miles, W. R., 7. 
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Minimum I1.Q. required for high 
school success, 9. 

Minimum value of intercorrelations, 
I5I. 

Multiple correlation, 66, 138 footnote, 
140, 148. 

Multiple ratio correlation, 26ff., 66, 
78ff., 84; “building up” a scale, 
138, 139; determining the n best 
tests, 137ff.; determining the n 
“major causes,” 138; diminishing 
returns with addition of tests, 137. 


Negative weighting of criterion fac- 
tors, 131. 

N. I. T., correlation with intelligence, 
14ff. 

Norms, C-1, clerical test, 107; Sten- 
quist Assembly Test; 110. 


Occupations of fathers, evaluation of, 
126, 127. 

Opportunity classes, tests of, 45, 46. 

O'Rourke, L. J. 24, 31. 

OtiswAMS. LO: 


Partial correlations, 29, 137, 138, 148. 

Policemen, tests of, I1. 

Practice effect, absence of, in arith- 
metic and reading, I1; improve- 
ment of correlations of mechanical 
ability, 1off., 37, 46. 

Prevocational courses, prediction of 
success in, 3Iff. 

Probability, réle of, in vocational 
guidance, 127. 

Problem of vocational guidance, 1. 

Problem solving test, II, 13. 

Psychological tests in employment 
Office, 1. 

Pupils, public school, results of tests, 
110. 


Reading tests, absence of practice 
effect, 11; Thorndike-McCall, 11, 
13; use in Arith.-Re. test, 11ff. 

Records, school, value of, 6. 

Reliability, General Trade test, 147; 
One-Word Answer Trade test, 142 
ff.; Stenquist Assembly Test, 1ooff.; 
Unrevised Mechanical Interest test, 
144, 145. ; 

Reliability of a test, improvement of, 
143, 144. ; 

Retardation (see Half-year Gains). 

Rosenow, C., 138. 


inonE. (Co Cy, 97 a 
Routine clerks, prediction of ability, 
g2ff. 


Ruger, H. A., 152. 
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Sackett, L. W., 63. 

Scale, ‘building up,” 139. 

School marks, transmutation of, 65. 

School records, value of, 6. 

School work, self correlation, 12. 

Scoring directions: Arith.-Re., 111; 
C-1, 114ff.; C-2, 117; I.E.R. As- 
sembly, 122ff.; Stenquist Assem- 
bly, 119. 

Scoring formulae, 76, 139-141; possi- 
ble exponential, 141; possible sliding 
scale weights, 141. 

Scoring method: determination of 
-best, 145ff.; of mechanical interest 
test, 145ff. 

Shop ranks, prediction of, 25ff. 

Silk mill operators, tests of, 10. 

Sliding scale weights, 141. 

Standard error of estimate, 128. 

Stenographers, tests of, 106. 

Stenographic, criterion, 129-133; dic- 
tation test, 129-130; school marks, 
129-130; transcription test, 129. 

Stenography, correlation with suc- 
cess in learning, 74-75; efficiency 
of tests in predicting, 80, 85ff. 

Stenquist Assembly Test, results in 
Company E, 110; correlations with 
Army Alpha, 33; with clerical 
ability, 102; with J.E.R. Assem- 
bly test, 22; with mechanical tests, 
17ff.; directions for administering, 
118, 119; directions for scoring, 
119; distribution of scores, II0; 
group differences in, 109, 110; rela- 
tionship to paper mechanical test, 
25ff.; reliability of, 46; results in 
Company O, 102ff.; results in 
Company W, 1ooff.; results of 
administration to ungraded girls, 
45, 46; scoring while administering 
paper tests, I19, 120. 

Stenquist Picture Test, correlation 
with Army Alpha, 33; correlation 
with mechanical tests. 17ff.; results 
in Company O, 1o2ff.; weighting 
Ob 121. 

Students’ estimates of students in 
mechanical ability, 37, 38. 

Summary of intercorrelations, boys 
and girls, 22 insert. 


Teachers, estimates of students’ 
mechanical ability, 38, 39; intelli- 
gence tests of, 9. 

Tests, never unweighted, 141; inter- 
changeability of, 4. 

Tests of, accountants, 106ff.; auto- 
motive students, 36, 37; auto 
repairmen, 10; bookkeeping stu- 
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dents, 36, 37, 63ff., 67, 74-75, 80, 
85ff.; business college students, 
63ff., 83ff.;clerks, goff., 10off., 102ff., 
106ff.; engineers, 8; executives, 102; 
foreign training class, 102; grapho- 
type operators, 7; hostlers, 10; 
machinists, 10; machinist students, 
36, 37; policemen, I1; prevoca- 
tional students, 24; stenography 
students, 74-75, 80, 85, 106; 
teachers, 9; typing students, 74-75, 
85ff., 106; ungraded girls, 45, 46; 
university students, 22-24, 106. 

Thorndike-McCall Reading Test, 11, 
1 

Thurstone Manual Training Test, 
126; correlations with mechanical 
tests, 17ff. 

Trade Test Division, 1. 

Trade Test, final examinations, 143. 

Trade tests, relation to intelligence, 
10. 

Transmutation of gross scores, 76, 


Truck drivers, tests of, 10. 

Typing: correlations with success in 
learning, 74-75; criterion, 133, 134; 
efficiency of tests in predicting, 80, 
85ff.; school marks, 133; textbook 
practice used as a test, 133. 

Typists, tests of, 106. 


Ungraded girls, tested with I.E.R. 
Assembly test and Stenquist As- 
sembly test, 45, 46. 

Unit tests, description of, 67ff.; im- 
provement upon current test tech- 
nique, 70ff. 

University students: comparison with 
soldiers, 23; correlation of Sten- 
quist Assembly and _ intelligence, 
24; mechanical ability of, 22; re- 
sults of C-1, 106. 


Validity, increase upon doubling 
test, 147ff. 

Validity, increased, formula for, 148. 

Variability of persons above the 
average, 152, 153. 

Vocational guidance: function of 
elementary school, 4; problem of, 
1ff.; rdle of probability in, 127; 
simultaneous scoring and admin- 
istration of tests, I19, 120. 

Vocational guidance tests, materials 
needed, 128. 

Vocational proficiency, hypotheses of 
relationship of test scores, 2. 

Vocational testing, a principle of, 3. 


Index 


Vocational tests: desirable time to 
administer, 2-3; desirability of 
zero correlation with intelligence, 
152, 153; trial selection of, 1. 


Weighting of: Arith-Re. test, 12, 13; 
bookkeeping criterion variables, 64; 
C-—1 clerical test, 89, 92; C-2 clerical 
test, 97; individual questions, 145; 
Stenquist Picture test 171. 

Weighting criterion factors 


nega- 
tively, 131, 134, 135. 
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Weighting factors, in Stenographic 
criteria, 130; Typing criteria, 133; 
Bookkeeping criteria, 135. 


Weighting: integral gross score 
weights, 88. 

Weighting tests, desirability of, 4. 

Weights, exponential, 141; sliding 


scale, 141; to predict differential 
success, 81, 87. 


Zero correlation in vocational tests, 
when desirable, 22ff., 152, 153. 
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